Using automated screenshots to test <canvas> and user interfaces

Written by Adrian Holovaty on March 9, 2015

If you build web sites that use HTML5 <canvas> or have complex user interfaces, you can make your job easier by adding automated tests. Here’s how I use Python’s little-known Needle library to test various front-end bits of Soundslice.

Soundslice is a music-learning web app that renders sheet music and tablature in client-side JavaScript. (Here’s a demo, and here’s an overview.) Two aspects of the site’s front end are particularly hairy and hard to test using conventional methods:

  1. The music notation, which is drawn using <canvas>. Rendering sheet music — the historic term is “engraving” — is a complex process with hundreds of rules and special cases. (See chapter 2.3 in this classic dissertation for an overview.) While some of the calculations can be tested by examining JavaScript data structures, the real meat of it is in the visual display.
  2. The user interface, which is highly interactive and dynamic. Soundslice has around a dozen user-togglable interface settings and different behavior depending on device and screen size. For example, a user can enable an animated fretboard, a settings pane, a track-specific settings pane and an audio-source selector — all at the same time, or in different combinations. These elements all interact with each other differently depending on screen size. And there’s an embeddable version, which does the same stuff but with slight interface changes.

When I first started building the notation engine, I did all testing manually, by eyeballing. I kept a library of several dozen notation files, each testing a corner case, and my testing process was cumbersome and error-prone. I found myself introducing bugs and inadvertently undoing previous work — all in all, a bad scene.

Problem was, I had no idea how to automatically test a <canvas> element. It’s an opaque thing, just a collection of pixels. Browsers provide an API to retrieve a <canvas>’s pixel data, but that’s too low-level for making useful tests.

That’s when Julien Phalip told me about Needle. It’s a Python library that takes screenshots of web pages and compares them to previously determined (“baseline”) screenshots, alerting you if anything has changed. It’s perfect for an opaque thing like <canvas> that can’t otherwise easily be tested.

Here’s how I’m using it. I wrote a bunch of Python unit tests that load various Soundslice pages and interact with them until they’re in proper state for screenshots. (This part of Needle uses Selenium, which lets you simulate mouse clicks, JavaScript calls, etc., from Python.) Then, a simple call to the assertScreenshot() method does the screenshot comparison.

Here’s a full example test:

from needle.cases import NeedleTestCase
from import By
from import expected_conditions as EC
from import WebDriverWait

class SoundsliceNotationTests(NeedleTestCase):
    def test_slurs1(self):
        # Load notation test URL.

        # Wait until <div id="loaderscreen"> is hidden.
        # That's how we know the notation is fully loaded.
        WebDriverWait(self.driver, 10).until(EC.invisibility_of_element_located((By.ID, 'loaderscreen')))

        # Grab a screenshot of the element with ID "sheetmusic"
        # and call it "slurs1_screenshot". If in baseline-saving
        # mode, this will create slurs1_screenshot.png. Otherwise,
        # this will assert that the screenshots are identical.
        self.assertScreenshot('#sheetmusic', 'slurs1_screenshot')

The first time I ever ran it, I used the --with-save-baseline option to generate baseline screenshots. From them on, each time I run the tests, Needle regenerates fresh screenshots and compares them to the baseline.

If a screenshot fails the assertion, Needle gives you both screenshots, so you can eyeball the differences — but it can also generate a visual “diff” image if you install PerceptualDiff (recommended!). Here’s an example diff image that Needle generated after I tweaked positioning of treble clefs:

Screenshot of visual diff

Most of the image is black, which means those pixels didn’t change. The blue pixels changed between screenshots, and it clearly shows the treble clef moved. (The diffs are not always so obvious, but they tend to be readable.)

As a bonus, my Needle baseline screenshots serve as visual documentation of notation changes. Whenever I fix a notation bug, I regenerate the appropriate baseline screenshot and check it into revision control. Then I can use BitBucket’s image diffs to visually remind myself how the notation rendering changed in a given commit.

I was so pleased with this system for testing <canvas> that I recently started using it for the general Soundslice player UI, too. I set up Needle tests that take screenshots of every combination of UI elements. When the tests run, it looks like this:

(Note the UI elements toggling and the various screen widths. In the 54 seconds of testing captured in this video, Needle took about 40 screenshots.)

These tests have prevented us from deploying a handful of CSS regressions to production. I highly recommend this testing technique for any non-trivial UI.

The downside of screenshot testing is that it’s brittle. Each browser renders <canvas> differently — in fact, sites exploit this to fingerprint users — so I take care to always run the tests in the same browser. (Needle, via Selenium, lets you choose the browser to test with.) If even a single pixel changes between screenshots, the test fails. Fortunately, this is OK for my particular use case, because being pixel-perfect is a goal.

Another downside is that a screenshot test is pretty far “up the stack.” I wouldn’t suggest using screenshot tests if a simple test of JavaScript data structures would suffice. I use Jasmine for those sorts of tests.

Ultimately, I’m grateful to have found Needle, and I shudder to think of developing Soundslice without it as a safety net. Go forth and take automated screenshot tests!

Comments aren’t enabled for this page.