Django tip: Caching and two-phased template rendering

Written by Adrian Holovaty on May 18, 2009

We've launched user accounts at EveryBlock, and we faced the interesting problem of needing to cache entire pages except for the "You're logged in as [username]" bit at the top of the page. For example, the Chicago homepage takes a nontrivial amount of time to generate and doesn't change often -- which means we want to cache it -- but at the same time, we need to display the dynamic bit in the upper right:

Screenshot of EveryBlock Chicago homepage

One solution would be to pull in the username info dynamically via Ajax. This way, you could cache the entire page and rely on the client to pull in the username bits. The downsides are that it relies on JavaScript and it requires two hits to the application for each page view.

Another solution would be to use Django's low-level cache API to cache the results of the queries directly in our view function. The downsides are that it's kind of messy to manage all of that caching, plus each page view still incurs the overhead of template rendering (which isn't horrible, but it's unnecessary overhead).

The solution we ended up using is two-phased template rendering. Credit for this concept goes to my friend Honza Kral, who suggested the idea to me during PyCon earlier this year.

The way it works is to split the page rendering into two steps:

It's a clever solution because you end up defining what doesn't get cached instead of what does get cached. It's a sideways way of looking at the problem -- sort of like how Django's template inheritance system defines which parts of the page change instead of defining server-side includes of the common bits.

In order to make this work, we had to write two parts of infrastructure: a template tag and a middleware class that does the cache-checking and rendering. The template tag looks like this:

# Copyright 2009, EveryBlock
# This code is released under the GPL.

from django import template
register = template.Library()

def raw(parser, token):
    # Whatever is between {% raw %} and {% endraw %} will be preserved as
    # raw, unrendered template code.
    text = []
    parse_until = 'endraw'
    tag_mapping = {
        template.TOKEN_TEXT: ('', ''),
        template.TOKEN_VAR: ('{{', '}}'),
        template.TOKEN_BLOCK: ('{%', '%}'),
        template.TOKEN_COMMENT: ('{#', '#}'),
    }
    # By the time this template tag is called, the template system has already
    # lexed the template into tokens. Here, we loop over the tokens until
    # {% endraw %} and parse them to TextNodes. We have to add the start and
    # end bits (e.g. "{{" for variables) because those have already been
    # stripped off in a previous part of the template-parsing process.
    while parser.tokens:
        token = parser.next_token()
        if token.token_type == template.TOKEN_BLOCK and token.contents == parse_until:
            return template.TextNode(u''.join(text))
        start, end = tag_mapping[token.token_type]
        text.append(u'%s%s%s' % (start, token.contents, end))
    parser.unclosed_block_tag(parse_until)
raw = register.tag(raw)

This template tag merely treats everything between {% raw %} and {% endraw %} as unrendered template code.

Then, in our base EveryBlock template, we wrap the appropriate bit of code in the {% raw %} tag, like this:

{% raw %}
	{% if USER %}
		<p>Logged in as {{ USER.email }}</p>
	{% else %}
		<p>Sign in / register.</p>
	{% endif %}
{% endraw %}

The final part is to write some middleware that renders every text/html response through the template system:

# Copyright 2009, EveryBlock
# This code is released under the GPL.

from django.core.cache import cache
from django.template import Template
from django.template.context import RequestContext
import urllib

class CachedTemplateMiddleware(object):
    def process_view(self, request, view_func, view_args, view_kwargs):
        response = None
        if request.method == 'GET' and 'magicflag' not in request.GET:
            cache_key = urllib.quote(request.path)
            response = cache.get(cache_key, None)

        if response is None:
            response = view_func(request, *view_args, **view_kwargs)

        if 'magicflag' not in request.GET and response['content-type'].startswith('text/html'):
            t = Template(response.content)
            response.content = t.render(RequestContext(request))

        return response

One thing to note here is that there's a backdoor for an external process (say, our script that resets the cache) to retrieve the halfway-rendered template code for any page -- magicflag in the query string. (We actually use something different on EveryBlock; I've changed this example.) So that means the only thing the cache-resetting script has to do is make a request to the appropriate page, with that query string, and save the result in the cache. Pretty slick.

There's also a potential gotcha/limitation here: anything within {% raw %} and {% endraw %} will only have access to a template context with the default RequestContext stuff -- which, in our case, will be user-specific stuff.

Thanks again to Honza for telling me about this concept. It's a great idea, and it's serving us well.

Comments

Posted by Andrew Ingram on May 18, 2009 at 2:28 p.m.:

Very, very clever. I look forward to using it myself.

Posted by David on May 18, 2009 at 2:35 p.m.:

Maybe I'm misreading something -- where is the cache getting set?

Posted by Adrian Holovaty on May 18, 2009 at 2:37 p.m.:

David: I didn't include the code that sets the cache, but it's simple -- just request the URLs with the magicflag and save the responses in the cache.

Posted by Honza Král on May 18, 2009 at 4:15 p.m.:

I am glad that you found that idea helpful, your solution is much more generic than ours, we let the individual templatetags handle the double render and have the middleware always present.

sample tag:
http://github.com/ella/ella/blob/e2853bc264f4cbf3e3f61536ba423fae55c2a5d7/ella/core/templatetags/hits.py#L76

middleware:
http://github.com/ella/ella/blob/e2853bc264f4cbf3e3f61536ba423fae55c2a5d7/ella/core/middleware.py#L16

Posted by Mark on May 18, 2009 at 5:27 p.m.:

Any reason for this code being released as GPL?

Posted by Patrick on May 18, 2009 at 7:55 p.m.:

I'm sure Adrian has his reasons, but if you can't use Adrian's GPLed code, you might want to take a look at the code in Honza's project (links in his comment on @4:15pm). The license for the CMS which uses it (Ella) appears to be standard 3 clause BSD.

Posted by Jeff Waugh on May 19, 2009 at 2:51 a.m.:

This would be a good opportunity to use ESI. :-)

Posted by Kr0n on May 19, 2009 at 3:13 a.m.:

Nice!

Wouldn't be the same use SSI with something like nginx and at the same time avoid a Django hit? It'd be going out the stack somehow, but...

Posted by Simon Willison on May 19, 2009 at 3:46 a.m.:

This kind of optimisation is one of the reasons I'm so keen on signed cookies as an alternative to sessions. If all you need to customise is the "logged in as..." box on a page, having the username stored in a signed cookie means you don't have to hit the database (or an external session store) /at all/ for the duration of the request - just pull out the cached copy, check the signature on the cookie, extract the username and render it out on to the page. And since the computation is done entirely by the app server it scales horizontally.

Posted by Sergey Shepelev on May 19, 2009 at 8:40 a.m.:

If "hello {username}" is the only dynamic part, then you don't use user accounts in first place. Registration and login just to see my name at website is awful.

P.S.: this comment form is bad too. It thinks that anything inside angle brackets is HTML. BUT, when i use proper HTML < it doesn't show angle bracket! What was on your mind - don't accept angle bracket AND escape ampersand?

Posted by Davide Della Casa on May 19, 2009 at 8:57 a.m.:

Why just not using a cookie with the username and let the browser to fetch it and render it in the page?

Posted by Adrian Holovaty on May 19, 2009 at 10:42 a.m.:

Mark: This is licensed as GPL because we're required to release EveryBlock's source code as GPL. The project is funded by a grant, and that's the license that we were asked to use.

Sergey: If I'm logged into Google and I view the Google homepage, I see my e-mail address at the top right, but it doesn't customize the page. I would argue that if a user is logged in, the developer has an obligation to let the user know that -- regardless of whether the particular page actually changes based on the user. (And in EveryBlock's case, *of course* we're customizing pages for users -- just not the homepage, at this time.)

Simon and Davide: With a cookie, you'd either have to parse it in JavaScript (which is non-ideal because it requires JavaScript) or do it in the application, in which case this two-phased template rendering would still help you, because you've still got to figure out a way to cache the heavy stuff and let the application do the username bit dynamically. The question of whether to store the username in a cookie vs. a session is tangential to this caching approach, isn't it?

Posted by Tom W. Most on May 19, 2009 at 12:45 p.m.:

Perhaps I'm missing something, but doesn't this leave you vulnerable to injection of Django template code? I don't see any method being used to escape the content outside of the {% raw %} tag from being interpreted as template code, or do you somehow guarantee that your data never contains "{{", "{%" or "{#"?

Posted by Adrian Holovaty on May 19, 2009 at 2:46 p.m.:

Tom: That's a great point, and I hadn't thought of it. Thanks for bringing it up!

A better version of this would take care of escaping everything *outside* the {% raw %} block between the first and second renders, to avoid a template-injection vulnerability.

Posted by Tim o'reilly on May 20, 2009 at 1 a.m.:

Is the code coming out by june end ?

Posted by Andrew Ingram on May 20, 2009 at 6:18 a.m.:

I'm trying it in conjunction with the @cache_page decorator for a customer shopping cart that appears in the header of every page.

It seems to work, but I'm worried I might be missing a security implication.

Posted by Simon Willison on May 20, 2009 at 5:29 p.m.:

Adrian - yes, that's what I was getting at - storing the username in a signed cookie is a great complement to this kind of two-phased rendering as it allows you to avoid having to even hit the database or lookup their session - you pull from cache, extract the username from the cookie, render the two together and you're done.

Posted by James Abley on May 21, 2009 at 5:57 p.m.:

I'm having a dense moment. Surely that's just basic multi-pass compiler [1] design? Does that not get widely used when implementing template languages? What am I missing?

[1] http://en.wikipedia.org/wiki/Multi-pass_compiler

Posted by coulix on May 22, 2009 at 9:27 a.m.:

How would you do to make it accept {% trans ''foo" %} and how could i add some context like mail_count that i currently get by calling a custom tag.

Posted by Dan on May 23, 2009 at 3:45 a.m.:

I did something similar, but using the {% templatetag %} for rendering the template code. A {% raw %} tag would be much more useful, and would be a nice addition to the Django trunk.

Posted by Simon Law on May 29, 2009 at 10:11 a.m.:

Tom is right. Adrian, you’ll want to hack the {% autoescape "on" %} tag so that all {% templatetag %} characters are properly escaped in your variables after the first pass.

Posted by Adrian Holovaty on May 29, 2009 at 11:10 a.m.:

Simon: Thanks for the suggestion. I've already solved it another way.

Posted by Andy Baker on May 30, 2009 at 4:33 a.m.:

Adrian - How far does template fragment caching take you before running out of steam? I was going to go down that road before I read this post.

Posted by ian on June 14, 2009 at 6:28 a.m.:

another way of doing this is to have a alternate template tag indicator for each phase.

{% .. %} for the initial run, and {$ .. $} for the 2nd run. we did this kind of thing in 2000 with SSI @cnet ;-0

Posted by ESI on June 16, 2009 at 3:33 p.m.:

We're planning in doing something similar using ESI (edge side includes): some smart reverse HTTP proxy support ESI which lets you replace some parts of a cached page with the response of another HTTP GET.

So, we cache the output HTML at varnish level (faster than doing it in RoR or even a Metal), get the session data via a 2nd HTTP request and calling a JS function that applies user customizations (we do a little bit more than displaying the username).

We are doing it with a JS request instead of ESI because Varnish won't gunzip the HTML returned by apache to check for the esi:include tags but it's going to be supported soon.

The only downside is that the app is not 100% usable without javascript...

Also Mnot created some Javascript functions to replace some parts of a HTML document with data from other HTTP reqs: http://www.mnot.net/javascript/hinclude/

Posted by Johan Bergström on June 27, 2009 at 1:56 a.m.:

My stab would be to split your page apart with SSI/ESI (although Varnish only) and cache parts of your page by routing them differently into Django.

ESI: Why toss javascript into the equation? Most people put something in front of varnish (until varnish handles it on its own) that takes care of deflate.

Posted by ZK@Web Marketing Blog on June 29, 2009 at 6:17 a.m.:

I did use them as an inpiration for a presentation of the ORM part of Django at the French Perl Workshop in Paris this week-end.
My slides (in french) are here : http://o.mengue.free.fr/blog/2006/11/...

Comments have been turned off for this entry.