Async therefore I am

Originally published here.

At Quartic Technologies we spend a lot of our time building micro-services. Our flagship product, GroundControl, consists of about 6 different services (with another 5 or 6 satellites for integration with external databases and APIs). We’ve more-or-less standardised on the Dropwizard framework, which has overall been a good choice — it’s fast, pretty low on boilerplate, and opinionated enough about healthchecks, metrics, configuration and such that we’ve so far avoided an endless spiral of bikeshedding on those topics.

One thing that has left me a little envious when prospecting alternatives such as Ratpack is the promise of asynchronous IO (or “async” as it’s often abbreviated). As it turns out, servicing requests asynchronously with Dropwizard is not only eminently possible but actually downright straightforward.

Making Dropwizard async

Curiously, despite the popularity of asynchronous web frameworks, there seems to be pretty scant mention of how to get Dropwizard to service requests asynchronously. A quick Google for “dropwizard async” provided my first clue: https://jersey.java.net/documentation/latest/async.html. Getting started is surprisingly easy:

@Path("/resource")
public class HelloResource {
    @GET
    public void asyncHelloWorld(@Suspended final AsyncResponse asyncResponse) {
        new Thread(() -> {
            sleepUninterruptibly(1000, MILLISECONDS);
            asyncResponse.resume("Hello world!");
        }).start();
    }
}

So as you’ll see above, your async resource methods receive an AsyncResponse object annotated with @Suspended. Your code can now schedule some long running computation or IO operation and hand control back to Dropwizard immediately, freeing the request thread up for other things. Once you have a result, the AsyncResponse object lets you drop it back to the client.

A real world example

An important part of GroundControl is the geospatial backend (internally we call it Weyl after the awesome physicist). One of this service’s jobs is to render vector tiles from our datastore and serve them up to the user. This involves a fairly costly sequence of events:

  • Receive a request for a particular map layer at a particular tile coordinate and zoom level.
  • Calculate the rectangle enclosing this tile.
  • Query our geospatial index for features from the layer within the rectangle.
  • Process and transform the features for display (this can involve a coordinate transformation, geometry simplification and preprocessing of attribute data to make the clients life easier).
  • Render to the protobuf based vector tile format.
  • Return the result to the user.

When the datasets become large, this pipeline can take several seconds to complete. Worse still, our client-side mapping layer (based on mapbox-gl) will often fire off a ton of requests simultaneously. Since this pipeline is compute bound, there is no point trying to process more tiles in parallel than we have cores. Moreover, we’d like to avoid filling up Dropwizard’s thread pool with a load of requests all pointlessly fighting for limited CPU time.

To maintain responsiveness under load, we want to offload these computations to a fixed size thread pool, handing the request thread back to Dropwizard to use for other things.

Threads. Awesome when they come in pretty colours like this. Sometimes less awesome.

It was a pretty simple job to switch our resource methods to use the @Suspended magic. With a few changes to the tile rendering pipeline to make it use a small, fixed-size thread pool we were handling tile requests fully asynchronously!

One piece worth noting here is that this change didn’t come without cost — in particular it meant rewriting some of our resource tests to deal with the AsyncResponse object.

It’s all about the thread count

To test things were working as intended, I spun up the Locust load-testing tool and simulated 500 users all requesting vector tiles from our backend. This VisualVM graph was particularly satisfying!

<b>Before (Synchronous)</b>. The thread count climbs as Locust “hatches” simulated users. <b>After (Asynchronous)</b>. Dropwizard happily queuing up requests without spinning up a million threads

Conclusion

Asynchronous web frameworks offer some pretty compelling performance improvements at the cost of a little code complexity and a slightly more difficult testing story. We’ve found this trade-off to be worthwhile for certain workloads, and Dropwizard lets us mix and match synchronous and asynchronous resource methods as we like.

About me

I’m Alex, and I work at Quartic Technologies. We help companies use data to improve their field operations.