Why software doesn't work anymore

No one expects software to work anymore

What was your reaction when that error dialog just popped up? Did you read it? Or did you just click cancel like you do for every annoying little box that pops up on your computer several times a day? Did you even realize that I insulted your mother?

You didn’t read it because your brain has been programmed over time to expect things like that. Turn it off and turn it back on again. Refresh the page. Clear cache and cookies. We’ve gotten used to it. We’re not surprised by any of it. And our expectations of our software just continue to fall as time goes on.

Happy Path Blindness

Does this look familiar?

async function updateUserProfile(req, reply) {
  try {
    await Auth.check(req.headers.Authorization)
    const user = await Db.getUser(req.user.id)
    if (!user.isInitialized) throw new Error('User is not initialized')
    await Db.updateUser({phoneNumber: req.phone})
  } catch (err) {
    const msg = err instanceof Error ? error.message : 'Failed to update user profile'
    req.log.error({err}, msg)
    reply.code(500).send({error: msg})
  }
}

It is quite seductive (and quite common) to write code that focuses the “happy path”, how you would like things to work, without taking the (considerable) extra time to stop and address each and every way that things can go off the rails.

The reality is that our software is incredibly error-prone, and yet the amount of error handling in any given codebase tends to be pretty minimal. In the snippet above (which I’m certain is quite similar to many apps you’ve worked on), every single line in the try-block can fail. But we don’t bother with any of the details and just wrap it all in a generic error message at the end. At least in this case there is a catch block, sometimes you don’t even get that.

In order to properly handle those errors you would first have to know which type of error each function can throw and what that error means:

Is it fatal?
Is it retry-able?
Do we need exponential backoff?
Does the user need to be notified?
What should we tell the user?
Or even, do we need to issue a refund?

Because the e in catch(e) is always of type unknown, even having answers to some of those questions in the existing documentation would demonstrate strong discipline by your predecessors. But let’s be honest, that’s unlikely.

A high degree of discipline is required to scale TypeScript, especially in large engineering organizations. But unfortunately, discipline does not scale.

Handling errors

Error handling in programming has a long history of being done poorly. First we had error codes, that people forgot to check. Later we got exceptions, which solved the forgetfulness problem by crashing the app. In HTTP we still have error codes, which the popular axios library checks by throwing exceptions that people forget to catch. What a compromise.

Any approach that makes it possible for a developer to forget to address the error is inherently flawed and bound to fail because (again) discipline doesn’t scale; it doesn’t scale over an org, a team, or even just over time.

What can we learn from Rust?

The popular Rust programming language doesn’t contain exceptions; throw is not a keyword. Instead it has Result<T, E> for recoverable errors and panic! for unrecoverable errors.

Result<T, E> represents one of two possible states; success or failure, the cat is alive or dead. It can either be Ok and contain your intended return type T, or it can be Err and contain a specific type of error E.

In order to get your data out of the Result, the compiler forces you to check whether the operation was successful or not. The discipline is enforced by the compiler, it scales!

“
From the Rust book:
Rust requires you to acknowledge the possibility of an error and take some action before your code will compile. This requirement makes your program more robust by ensuring that you’ll discover errors and handle them appropriately before deploying your code to production!

Result > Exception

As you can see, Results have a number of advantages over throwing exceptions:

Checking errors becomes a compiler requirement
- in TS implementations it becomes more of a strong suggestion but such is the nature of JavaScript
No hidden control flow (exception bubbling)
Crystal clear guarantees of how the code succeeds, and how it fails, at a glance.
Possible errors are documented by the type system

How might we implement Results in TypeScript?

type Ok<T> = {type: 'Success'; value: T}
type Err<E> = {type: 'Failure'; error: E}
type Result<T, E> = Ok<T> | Err<E>

By using a discriminated union on type, when a function returns a Result<T, E> you’re now forced to check which variant you have in order to access the value:

const userResult = await Db.getUser(req.user.id)
if (userResult.type === 'Failure')
  return reply.code(404).send({type: 'Failure', error: userResult.message})
const user = userResult.value

Well, that was easy!

We could even create some helper functions to make Results easier to work with

const ResultHelpers = {
  succeed<T>(value: T) {
    return {type: 'Success', value}
  }
  fail<E>(error: E) {
    return {type: 'Failure', error}
  }
  isSuccess<T, E>(result: Result<T, E>) {
    return result.type === 'Success'
  }
  isFailure<T, E>(result: Result<T, E>) {
    return result.type === 'Failure'
  }
  /**
   * If the result is successful, transform the output value
   * If the result has failed, return that original failure
   */
  map<T, E, T2>(result: Result<T, E>, fn: (value: T): T2) {
     if (result.type === 'Failure') return result
     return {type: 'Success', value: fn(result.value)}
  }
  /**
   * If the result is successful, pass the success value to a function that performs another action
   * If the result has failed, return that original failure
   */
  andThen<T, E, T2, E2>(result: Result<T, E>, fn: (value: T): Result<T2, E2>) {
     if (result.type === 'Failure') return result
     return fn(result.value)
  }
  /**
   * If the result is successful, return that original success
   * If the result has failed, return a fallback value
   */
   unWrapOr<T, E, T2>(result: Result<T, E>, orValue: T2) {
     if (result.type === 'Success') return return result
     return {type: 'Success', value: orValue}
  }
  /**
   * If the result is successful, perform some side effect, then return the original success
   * If the result has failed, return that original failure
   */
  inspect<T>(result: Result<T, unknown>, fn: (value: T) => unknown) {
     if (result.type === 'Success') fn(result.value)
     return result
  }
  /**
   * If the result is successful, return the original success
   * If the result has failed, perform some side effect, then return the original failure
   */
  inspectError<E>(result: Result<unknown, E>, fn: (error: E) => unknown) {
     if (result.type === 'Failure') return fn(result.error)
     return result
  }
}

Most operations that might fail are async, how could we handle those situations?

type ResultAsync<T, E> = Promise<Result<T, E>>
type ResultMaybeAsync<T, E> = Result<T, E> | ResultAsync<T, E>

We could then adapt our helper functions to operate ResultMaybeAsync instead, allowing for a unified api for both sync and async results.

Luckily, someone has already done this

The neverthrow npm package is essentially a port of the Rust Result<T, E> implementation. It is the most popular library in this space, but in my opinion has a fundamental flaw because it’s based around class instances, which are not serializable. This means that neverthrow is difficult to use close to client-server boundaries, which is just about everywhere in a web app.

There is a new library called @praha/byethrow that solves this problem by having Results just be POJOs (plain old javascript objects) like we’ve seen above. Not only are byethrow’s Results serializable but, but they can be sent from client ↔ server, and can be validated with a schema. This means that you can return Results from your API endpoints, from your React server functions, from your React Query hooks. From anywhere to anywhere.

Example of byethrow

Byethrow’s API is quite similar to what we saw above. Their generics are more advanced and support a unified interface for Result and ResultAsync which is really nice.

Additionally, they have a Result.pipe() function that allows you to create a very natural pipeline of result operations:

import {Result} from '@praha/byethrow'

const result = Result.pipe(
  // Start with a success value of 5
  Result.succeed(5),
  // The perform a side effect of logging that value
  Result.inspect((value) => console.log('Debug:', value)),
  // Then transform that success value by multiplying it by two
  Result.andThen((x) => Result.succeed(x * 2)),
)
// Console output: "Debug: 5"
// result: { type: 'Success', value: 10 }

Results in the Real World

Let’s take the original example from earlier, but now with Results (and a full Fastify route).

import {Result} from '@praha/byethrow'
import z from 'zod'

function resultSchema(args: {value: z.ZodSchema; error: z.ZodSchema}) {
  return z.discriminatedUnion('type', [
    z.object({type: z.literal('Success'), value: args.value}),
    z.object({type: z.literal('Failure'), error: args.error}),
  ])
}

export async function updateUserProfile(fastify: FastifyInstance) {
  fastify.withTypeProvider<ZodTypeProvider>().route({
    method: 'PATCH',
    url: '/user',
    preHandler: fastify.auth([fastify.verifyToken]),
    schema: {
      response: {
        default: resultSchema({value: z.string(), error: z.string()}),
      },
    },
    async handler(req, reply) {
      const authResult = await Auth.check(req.headers.Authorization)
      if (Result.isFailure(authResult)) {
        // Because I have explicitly checked for a failure, the compiler allows me to access .error
        // Because authResult.error is strongly typed, I can be certain it has a .message
        return reply.code(401).send(Result.fail(authResult.error.message))
      }

      const userResult = await Db.getUser(req.user.id)
      // 1. Because I have explicitly check for a failure here
      if (Result.isFailure(userResult)) {
        return reply.code(404).send(Result.fail(userResult.error.message))
      }

      // 2. The compiler allows me to access .value here
      const user = userResult.value
      if (!user.isInitialized) return reply.code(400).send(Result.fail('User is not initialized'))

      const updateResult = await Db.updateUser({phoneNumber: req.phone})
      if (Result.isFailure(userResult)) {
        return reply.code(500).send(Result.fail(updateResult.error.message))
      }

      return Request.succeed('User profile updated successfully')
    },
  })
}

Here we’ve seen an API endpoint that both uses Results internally, and unconditionally returns Results to the client. Assuming that the client types their responses correctly, this will force the client to also check for the possibility of an error, even if they don’t have byethrow installed!

But how does it make you feel?

We have taken a 12-line function and turned it into a 21-line function that will operate almost identically almost all of the time.

But I’m sure you recognize that in doing so we’ve also turned this endpoint into a piece of code that will function predictably every time, instead of just most of the time.

Programming like this might feel arduous if you haven’t done it before, but your users, and your future self on prod support, will thank you.

I’ve found throughout my career that the best programming paradigms are the ones that force you as the programmer to make more decisions when you’re writing the code. As we’ve just seen, Results are a good example of this. They force you to made decisions about how to handle errors at compile-time rather than hiding them until runtime when they bubble up to the root of your application.

My other favorite example of this is the .NET library Noda Time. It doesn’t allow you to just create a new Date() and run with it. The API of Noda Time forces the programmer to make decision about whether they’re trying to represent local time or time in a specific time zone, and which time zone!

Instant now = SystemClock.Instance.GetCurrentInstant();
ZonedDateTime nowInIsoUtc = now.InUtc();
var london = DateTimeZoneProviders.Tzdb["Europe/London"];
var localDate = new LocalDateTime(2012, 3, 27, 0, 45, 00);
var before = london.AtStrictly(localDate);

A second kind of error

It’s important to recognize that there are two kinds of errors:

Errors that are possible to anticipate and recover from
- Most errors fall into this bucket
- e.g. file not found or database connection failed
Truly “exceptional” errors that can’t be anticipated and there is no way to recover from
- e.g. AWS is having an outage or the API I’m trying to hit has crashed,

These two types of errors must be handled differently in code. The Results that we’ve discussed in this post are meant to handle type 1. But what about type 2 errors?

Fortunately (in terms of work for you, dear programmer) because by definition the only thing that we know about type 2 errors is that they will happen eventually, there’s not terribly much that can be done to handle them. This is where generic error handling at the boundaries of your application come into play.

For React apps this means Error Boundaries either around your whole app or independent sections of your app that will catch the errant exception and display something to your users that isn’t a blank page.

For APIs this means some mechanism around each route that will catch an exception and automatically return a 500. Fastify, for example, has a generic error handler that you can customize should you desire.