A/Bingo split testing now on App Engine, built for Khan Academy

Continuing my trend of straight-up copying the work of the smartest people I know, I recently decided to tackle Khan Academy's A/B testing problem (we didn't have any A/B testing) by bringing Patrick McKenzie's A/Bingo into App Engine land.

So here you go: GAE/Bingo is released and should get anyone on App Engine up and A/B testing in minutes. It’s currently in production on Khan Academy and performing well with hundreds of requests per second.

Cool! But what’s A/Bingo?

A/Bingo is an A/B testing framework for Ruby on Rails. It’s specifically designed to make the creation of split test experiments as quick and painless as possible.

GAE/Bingo is a re-imagining of this framework’s core design principles inside of Google App Engine. GAE/Bingo was specifically built for use at Khan Academy, which means:

  • GAE/Bingo is highly optimized for App Engine. Drop it right into any Python App Engine app and you’ll be all set. It’s running in production for us with minimal impact on page load times.
  • GAE/Bingo works hard to persist experiment data for long periods of time without sacrificing performance.
    • Unlike most A/B experiments out there, we may keep an eye on the educational results of an experiment for months and months after a student started down their A/B path.
    • Heck, we may even need to correlate our A/B student splits with offline data, like end-of-the-year test scores.

Why do this?

We’ve got a million and one ideas to try out at Khan Academy. What tweak to our game mechanics will best motivate students to challenge themselves? What message makes it most likely for a student to sit back and watch a video when they really need to take time, slow down, and re-learn a core concept?

An A/B testing framework gives us the tools necessary to start answering these questions with experiments and hard(er) data. With ~1.5MM practice exercises answered per school day by Khan Academy students, we have a treasure trove of student activity from which to learn.

We also wanted to spread the love. Patrick helped out the Rails community by open sourcing A/Bingo, and we wanted to do the same for App Engine. I also couldn’t find any Python split testing framework that satisfied our needs and stayed true to the design principles of A/Bingo.

Plus, why not take advantage of the fact that App Engine’s vertical stack empowers framework creators to go pretty far when it comes to creating a drop-in, It Just Works experience? We hope GAE/Bingo accomplishes this and helps out some others in the community.

How’s it work?

Start an A/B test in one line:

# Returns "chimpanzee" to half your users and "zorilla" to the other half
animal = ab_test("cute_logo_animal", ["chimpanzee", "zorilla"])

…and when something good happens, score a conversion in one line:


These two lines will automatically take care of experiment creation, user tracking, consistent A/B results for each individual user, and statistical analysis. You can do a lot more, of course, when it comes to specifying alternatives and tracking conversions — check out the docs. There are some pretty simple (optional) hooks that make it very easy to get consistent A/B results even when your users transition from anonymous to logged-in.

Trivial example: an A/B test proving that messaging a student with “You’re ready to move on!” is statistically more likely to encourage a student to move into to new content than “Nice work!”

Once at least one user causes the above lines of code to execute, you’ll get statistical analysis and be able to control your experiment from the dashboard.

Can I use it? Can I tweak it?

Please do. Let me know how it goes via Twitter or email. Patch up all the inevitable bugs and fill in all the major holes left by our desire to ship v1. We’ll be continuing to improve the framework, and all help is welcome.

Enormous thanks goes out to Patrick McKenzie for his framework’s inspiration and the encouragement to follow his lead. I’ll be blogging more in the future about how we’re using GAE/Bingo and how we keep track of hundreds of requests per second w/ persistent storage and minimal impact on page load times.

9/13/11 — 12:32pm Permalink