The phrase “the App Engine way” is muttered around our office and wielded in our code reviews. As in, “Yeah, I know it’s counterintuitive, but that’s the App Engine way.”
It boils down to two principles about structuring data:
These aren’t obvious. It won’t be obvious to folks from SQL land that App Engine’s performance is tied so tightly to these ideas. It won’t be obvious that, since
JOINs don’t exist, it’s really easy to write a bunch of code that
1) loads a Monkey entity, 2) looks at Monkey.zoo_id, then 3) queries for the monkey's Zoo…and that that’s bad.
So we use “the App Engine way” to remind ourselves of these fuzzy guidelines and spread ‘em to newcomers.
Denormalize your data — as in your Notification models should have all the data necessary to render themselves. They shouldn’t just point to the thing that triggered a notification. As in whenever you’re sitting around wishing you could just
JOIN Zoo ON Zoo.id = Monkey.zoo_id you should actually be storing all of
Zoo's properties on each and every
Monkey. Got billions of
Monkeys? Who cares, copy the
Zoo data a billion times. “Storage is cheap” and all that.
Do lots of work on write to make reads fast — as in your
Post models should have their
monkey_count stored as properties on the Post itself. As in they should be updated every time a new
monkey is brought into this wonderful world. As in any time you’re sitting around googling for App Engine’s version of
AVG(), realize you should just be updating these aggregates every time they’re modified. As in whenever you write to a
Zoo entity, kick off a task to update all denormalized copies on your
Monkeys. Who cares if writes are a bit slower? This makes reads blazing fast.
A simple rule of thumb for knowing when you’re done:
If you really care about performance, you should be able to create all of the data queries for an entire web request just from incoming GET/POST data and the currently logged-in user’s properties — no query-to-get-the-data-for-another-query allowed.
This is the price of entry for access to one big, enormous, whopper of a benefit: you get a datastore with consistent read/write performance no matter how big your data set becomes. Have a query that returns 10 items from a set of 10,000? Relax knowing that after your company blows up in 6 months, you’ll get the exact same speed when pulling 10 items from a set of 10,000,000,000.