Prevent application and network instability by serving stale content

Stale content: it’s good for you.

Caching content at the edge of the network, as Fastly does, makes your site faster and more reliable. But what if we want to go the extra mile and achieve “virtually bulletproof” performance? We'd have to make it possible for Fastly to serve content even if we were unable to reach your origin servers or if they were not operating correctly. To achieve this, we need to use expired content: a technique we call serving stale.

If your website were a radio station, then your edge cache would be your transmitter towers — essential to extend your reach to a huge audience, but useless without the signal from HQ to broadcast. Large state broadcasters have long realised this and placed local recordings of content at transmission sites "just in case" the transmitter lost its uplink to home base.

Serving content from Fastly to end users is fast and reliable. But this only happens, by default, when that content is available and fresh in the cache. Like transmitting stations, we have a temporary local copy of your content, and whether you choose to use it depends not just on whether it is fresh (within its configured time-to-live, or TTL), but also whether your origin server is performing as expected.

Sadly, we often don't get to help you out as much as we'd like, because standard cache-freshness rules (which we respect) tell us that we can't use stale content, even when your servers are offline. But adjusting those rules is easy, and the solution is built into the HTTP Cache-Control header. Using stale-while-revalidate and stale-if-error we can help users get a reliable experience more of the time by having Fastly serve stale content, regardless of the reliability or availability of your origin.

The stale cache directives

It may surprise you to discover that these cache-control directives have existed since 2010. They were defined in RFC 5861 - HTTP Cache-Control Extensions for Stale Content — originally authored by our Fastly colleague, Mark Nottingham. Let's dig into what each of the directives does.

Stale while revalidate: eliminate origin latency

stale-while-revalidate tells caches that they may continue to serve a response after it becomes stale, for up to the specified number of seconds, while working asynchronously in the background to fetch a new one, without blocking. Here's an example:

A cache receiving an upstream response with this header will store and reuse that response for up to five minutes (300 seconds) as normal, without having to re-check (revalidate) with the origin server. But once that time expires, the cache would normally have to send a request to origin and start queuing up any other inbound requests until the response is received.

However, in this case the stale-while-revalidate directive allows the cache to continue to serve the same content for up to another 60 seconds provided it uses that time to try and get a new one from the origin. As soon as a new response is available, it will replace the stale content and its own cache freshness rules will take effect. However, if the 60 second revalidation period expires and it hasn't been possible to get updated content, the cache will no longer be allowed to use the stale version.

We’ve seen this be really useful for news and publishing websites, where a 60-second background-refresh window means that any page that is viewed on average more than once a minute will always be served from cache, and yet will still be up-to-date. Another example is a weather application fetching weather conditions from an API where 30 seconds of stale content won’t do much harm.

Stale if error: survive origin failure

stale-if-error tells a caching proxy that if an error is encountered while trying to talk to an origin server, a stale response may be used instead of outputting an error. This helps us provide a more seamless user experience even during periods of server instability.

Cache-Control: max-age=300, stale-if-error=86400

In the above example a cache would, as before, store and serve the fresh content for five minutes, but this time, when the five minutes have expired, the next request for this content will block on a fetch to origin. Unlike stale-while-revalidate, stale-if-error doesn't allow any asynchronous revalidation, but it does allow the stale version to be used if the origin fails to respond (whether due to a timeout, connection error, or malformed response). Once the stale period expires, the content can no longer be used as a backup, and if an error response is returned from the origin, the cache must return the error, even if the stale version is still in storage.

Browser caching (because stale isn't just a CDN thing)

stale-* directives apply to all caching HTTP clients, not just CDNs and other non-browser clients. As of 2019 some of the major browser implementations have started to support the stale-while-revalidate directive in the browser HTTP cache (see the MDN compatibility table for a full list of currently supported browsers).

While stale-serving in browsers is also useful, if you are trying to affect only the behavior of your CDN, consider using the Surrogate-Control cache header. This is exactly the same as Cache-Control but overrides it if both are present, and is removed by the CDN, so you can control the stale logic for CDNs independently of browsers.

Surrogate-Control: max-age=300, stale-while-revalidate=60, stale-if-error=86400
Cache-Control: max-age=60

In the example above, if the user requested the page and then requested it again more than two minutes later, the browser would block on a network fetch (because its cached copy is stale and has no stale-* directives), but the CDN would return stale content immediately while kicking off an asynchronous revalidation in the background.

Serving stale with Fastly

So far, everything we've covered is simply part of the HTTP Caching specification, and not specific to any particular cache implementation. And indeed, setting the relevant directives in your Cache-Control header is all you need to do to have Fastly serve stale. But let's consider how we can do even better with a bit of additional logic as part of your Fastly configuration.

Let's start by looking at the four possible freshness states that a piece of content can be in when we receive a request for it from one of your website's users:

Fresh: we have a copy of the content, and it's within its TTL
SWR: we have a stale version of the content, and we're within a stale-while-revalidate period
SIE: we have a stale version, and it doesn't qualify for SWR, but we're within a stale-if-error period.
None: we don't have the content, or if we do, we're not allowed to use it under any circumstances

In addition, there are four possible states that your origin server can be in:

Healthy: origin is up and working
Erroring: origin is returning syntactically valid HTTP responses in the 5xx range (eg. 503 service unavailable)
Down: origin is unreachable or unable to negotiate a TCP connection
Sick: Fastly has marked this origin as unusable because we've been consistently unable to fetch a healthcheck endpoint.

This gives us 16 possibilities, which we can visualise as a grid to show where good and bad things happen:

		Content
		Fresh	SWR	SIE	None
Origin	Healthy	😀	😀	😴	😴
	Erroring	😀	😀	😡	😡
	Down	😀	😀	😡	😡
	Sick	😀	😀	😀	😡

Here we have only three possible outcomes for the user: either they'll see the content they want delivered from cache (😀), they get the content but blocked on a synchronous origin fetch (😴), or they see an error (😡) which could be either an error generated by Fastly trying to connect to your origin or whatever your origin server returned.

There are some situations here that we could improve by adjusting the default Fastly configuration to be more aggressive about using stale content:

By default, Fastly only uses stale content in an “error” scenario if the origin server has been marked as “sick” by an origin health check. This should happen fairly quickly if an origin goes offline, but if you don't have a healthcheck, or the healthcheck is working but the particular request is failing, you will still serve the error to the user. We can detect these situations and revert to the stale version.
If we have to deliver an error response to the end user, by default we just serve whatever error your origin produced, or if necessary, a generic error from Varnish — the caching software Fastly runs. We can detect these situations and return custom branded content.

With these extra rules in place you can achieve this:

		Content
		Fresh	SWR	SIE	None
Origin	Healthy	😀	😀	😴	😴
	Erroring	😀	😀	😴	😐
	Down	😀	😀	😴	😐
	Sick	😀	😀	😀	😐

You're now delivering the content the user wanted in more cases, and when you're not, you're still delivering predictable, professional, and helpful error content.

With the Fastly Fiddle playground, we can test and demo the code we need to make these configuration changes and create a bullet-proof serving stale solution on the Fastly edge. Let’s pick this apart to understand what's going on.

Fastly's request processing starts in the vcl_recv subroutine, where we need to make sure we're handling an end-user request, and not one passed up from another layer of Fastly. But the important logic for serving stale happens when we receive a response from your origin server, in the vcl_fetch subroutine.

If at this point the origin gave us a response we don't like (we received a resp.status > 500) and a stale object is available in cache, we can choose to serve it immediately via the deliver_stale return statement. If not, we can trigger an intentional error, to avoid serving the error message from the origin server directly to the end user.

If we do like the response from the origin, then this is also where we can configure the stale-while-revalidate and stale-if-error periods, if they aren't already configured in the Cache-Control header we received on the response.

So there are three possible outcomes here: a good response which will now enter cache and be served to the user, a fallback to a stale response, or an intentional error. The intentional error we trigger here will take us to the vcl_error subroutine. We'll also end up in vcl_error directly (without running vcl_fetch) if Fastly got a network error instead of a response.

We end up in the vcl_error subroutine if we call it explicitly from vcl_fetch, and also if there are any network connection problems when sending the request to origin. In the event of network issues, Fastly will generate a 500-series error internally and pass it to vcl_error.

We already checked in vcl_fetch for a stale object, but if we ended up here because of a network problem, we will not have run the vcl_fetch code, so it's worth checking again if there's a stale copy and returning it. Finally, if there is no stale object, we should deliver a synthetic error with a helpful message for the user.

Regardless of whether we ended up with fresh, cached, stale, or error content, all roads eventually lead to the vcl_deliver subroutine. Here we mop up an annoying edge case that doesn’t require much code but is an artefact of some complexity in the way Fastly spreads cached content out across a data center. It's likely that the fetch happened on a different Fastly cache server than the vcl_deliver subroutine is now running on, so we'll do one last check to see if the local cache on the deliver server has a stale version of this object. If so, we'll use it, but if not, we're ready to deliver the best response we can.

Best practices

We recommend the following best practices when implementing stale logic with Fastly:

Short stale-while-revalidate, long stale-if-error. If your origin is working, you don't want to subject users to content that is out of date. But if your origin is down, you're probably much more willing to serve something old if the alternative is an error page.
Use shielding to increase the cache hit ratio and increase the probability of having stale objects to serve. In principle, “shielding” is the practice of placing one layer of Fastly behind another and focusing requests on a single location. This is a great way to increase reliability but will also increase the complexity of the stale VCL logic.
Always use soft purges to ensure that stale versions of objects aren’t also evicted.

Stale content = happy users

Hopefully you now have a better understanding of what the stale-* family of cache-control directives is, how they can help you stay more resilient in the face of application and network instability, and how you can use their capabilities at the edge to consistently deliver fast and delightful user experiences. We’re always interested in hearing about your experiences so please reach out to us and tell us how you went from 😡 to 😀.