Great lengths for immediate feedback: reporting on App Engine

Can you imagine unlocking a new achievement or shooting the hat off your enemy in a Red Dead Redemption duel and not getting immediate feedback? The game would straight-up be considered broken, no questions asked.

Yet, plenty of game-like mechanics systems on the web are not immediate. Most badge systems make you wait a while before being awarded with the achievement you’ve just unlocked. Just drove a whole crapload of traffic to your web site? Many of the biggest analytics tools will make you wait a couple hours before you know how much you’re getting or where it’s coming from.

There are good technical reasons for this. Great, often-better-than-the-alternative technical reasons. It’s still broken in the eyes of most users who’ve grown up on Xbox Live or Farmville.

This is why we’re not ashamed of putting some serious technical effort behind immediate feedback for Khan Academy users. I went into a lot of detail about our efforts to award Khan Academy badges in real-time. We recently made a similar decision for student profiles.


An anonymous but real KA user over the last month. It’d be pretty depressing if we didn’t show the most recent (and best day) of this student’s KA career just because our system can’t handle real-time aggregation of large amounts of data.

Technical Content Starts Here

With student profiles, each user or coach can mess around with all types of informative charts about students’ activity over the past few days, weeks, or month. This is no big deal until we try to summarize monthly activity for our diligent users who’ve worked on literally thousands and thousands of problems and watched thousands and thousands of clips of video in those months. App Engine is great at storing this data and getting it out piece-by-piece. Asking for it all at once is a no-no.

Anybody who’s worked with App Engine (or any NoSQL datastore) knows that trying to summarize this amount of data on the fly to create interesting aggregate reports is a recipe for are-you-kidding-me-why-didn’t-we-use-SQL-again-this-is-insane misery. Creating really detailed reports that can be sliced’n’diced across all sorts of different metrics just isn’t simple to do if you’re dealing with a lot of data and responsiveness is important.

Solution 0: Just ask App Engine for all the data you need, calculate your aggregates in code, and send ‘em to the user whenever requested. Seriously, this isn’t going to work unless you’re on a tiny app, so I’m crossing it out. The data store will not reliably give you that much data unless you really don’t care about speed and/or fairly frequent timeouts.

Solution 1: The standard “do as much work on write so reads are fast” solution is fine (translation: “know all aggregate metrics you need to create reports for, and every time they might be modified, recalculate and cache ‘em”), unless you’re unwilling to slow down all your writes. We’re unwilling.

Solution 2: If you still want fast writes, you probably want to run a scheduled task across your data that quietly sifts in the background, summarizing interesting aggregates and writing ‘em somewhere else that can be quickly reported whenever the user pulls up a chart.

Solution 2 is great, and with App Engine’s mapreduce implementation it makes a lot of sense. It just doesn’t happen in real-time.

We want to show users interesting views of their past 30 days of activity. And if they just finished an exercise 10 minutes ago, that data better be included, regardless of our Solution 2 background task’s frequency.

We ended up with something like a hybrid Solution 2.5 that uses the slowly summarized and quickly queryable data from the previous month, grabs all the most recent, real-time, unsummarized data from the past day or two and secretly smushes ‘em together (fixing up bunches of little time zone problems as we go along)

This immediate-feedback mindset costs us a lot of developer time, but it gives our users tools for self analysis over month-long time frames without sacrificing the ability to always see the most recent efforts in real-time.

Comments 2/1/11 — 11:38pm Permalink