Fine-grained cache-name control with service workers

Written by Adrian Holovaty on December 27, 2015

Service workers are awesome. They’re the most exciting development in web programming since Ajax. I’ve excitedly been working on adding a service worker to Soundslice, but I ran into a few bumps on the road. Here’s the biggest bump I’ve run into so far, along with how I solved it.

(This post assumes basic knowledge of service workers. For a decent primer, see here.)

Background

Soundslice renders music notation entirely in client-side JavaScript. Each “detail page” (example) is made of a JavaScript/HTML/CSS shell plus some JSON files that contain the page’s music-notation data.

The notation JSON files are stored in Amazon S3 and served by the CloudFront CDN, and I use expiring/one-time URLs for security.

For now, my goal with the Soundslice service worker is to cache static assets, plus the notation data — resulting in a nice, simple performance win. (I’m looking forward to making a service-worker-powered offline version of Soundslice, but that’ll require a slightly bigger refactoring of various bits of the site. I’ll be sure to write it up here when that happens.)

The problem: caching expiring URLs

A service worker’s cache deals directly with HTTP request and response objects, and the cache is keyed on the entire request — the URL, HTTP method and various headers. It’s kind of opaque/heavy: when you put a request/response in the cache, the only way to get it out is to pass the same request as the cache key.

This posed a problem for Soundslice. Each time you load a Soundslice detail page, the JSON data has a different (expiring) URL, even though it’s the same data. The URLs look like this (with line breaks added for readability):

https://d1vuq0zzaa789.cloudfront.net/json/auld-lang-syne/ data.json?Expires=1451224734&Signature=h0rlgDkZxDABihILdz eIVyaKdikgwrrszLpuebhU~0v6YN-mDHUr22oDDLYG40-KpVaSuiNWnYT 6cbTmnTzfLPDd-~Ihw1Zr5ANr2cDjaLglAhuprE2jpigeqRJpYX4Bcc8h gCThT1JEm0cfdwx6yPHeTrqQHNlxo4Pf7~xRtDYYgNwfj9mgDFH13MAfn h1JsgtF19RHyXWzdCRqY~VnxJ5-4dIL54TyuztpzsocMnIbyGPzlM-Vd2 ZHfjUEWgpzX1b07i2LqarxK-p5D7~Nd7dvz0vIs1b8BrgUR1umYMPxK3s XOszjZMSL6aJTPZEhdOmwi-bBUEgr0wNOemhMHQ__&Key-Pair-Id=APA KJNLCKIXICPJV62XA

The relevant part is /json/auld-lang-syne/data.json. The rest of the URL is just ugly authentication stuff. If I use these requests with the service worker cache, I’ll never get a cache hit; the URLs are always different! How can I use only the relevant subset — /json/auld-lang-syne/data.json — as the service worker’s cache key?

The non-solutions

Fortunately, the service worker spec has an ignoreSearch parameter, which I imagine was created precisely for this purpose. You pass ignoreSearch to cache.get(), and it’ll ignore the query string when looking for a match. Unfortunately, this isn’t yet implemented in Chrome. (A developer claimed the ticket just a few days ago. Ah, life on the bleeding edge!)

Since ignoreSearch is off the table for the near-to-mid future, what else can we do?

One solution would be to change my content-protection setup to use signed cookies instead of signed URLs. That way, the JSON URL would always be this:

https://d1vuq0zzaa789.cloudfront.net/json/auld-lang-syne/data.json

...and the authentication stuff would be handled by a cookie. Though it’s a temptingly elegant solution, I opted against it because I don’t want to require cookies just to view pages on our site.

The hacky solution

With those options exhausted, I was just about to give up on service worker caching for this part of Soundslice. Then I stumbled upon a comment by Jeff Posnick, deep in a GitHub ticket page:

“The workaround I'm using is to create individual Cache objects, each with one entry and with a name that includes the fingerprint, and it's kind of ugly.”

Aha! The service worker API lets you have multiple caches, each with its own name. Most of the service worker examples I’ve seen use the cache names for versioning purposes (so you can easily delete stale cache data). But there’s nothing stopping you from using the cache names to represent the filenames. As Jeff notes, yes, it’s ugly...but it solves the problem!

So the solution is to create a separate cache for each file — e.g., /json/auld-lang-syne/data.json — and stash the request in there. It doesn’t matter that the request’s URL has the huge query string; we’ll no longer be using that as the cache key.

Here’s the code I’m using now. It lets you specify an arbitrary cache key for your requests and responses.

Example usage:

I’m placing the above code in the public domain.

A few gotchas. With this approach, you’ll need to roll your own cache-expiration logic. You could perhaps keep track of all keys you’ve stored in the cache, along with a version number, to make cache management possible. Also, if you’re using this approach in tandem with the “proper” way of doing service worker caching, you should take care to avoid potential cache-name clashes.

A more elegant solution

After I posted this blog entry, service worker co-creator Alex Russell told me about a cleaner solution. You can control the cache key by creating a Request object manually, passing whatever URL you want.

So the solution would look like this:

Much better. Thanks, Alex!

Hope this helps somebody. And even if it’s not relevant to you, I hope this writeup gets you thinking about new things you can do with service workers!