I’m gonna don my rose-colored glasses and take a look at Google’s newly announced Compute Engine. As the majority of Khan Academy runs on App Engine, I’m very excited about a specific possibility here.
While I’m aware of the fact that Compute Engine’s list of use cases is currently aimed at backend data analysis type stuff, I’m gonna take a guess and say that it’s only a matter of time before they add the necessary tools to start serving user-facing web apps.
From http://cloud.google.com/products/compute-engine.html —
I doubt someone tripped and typed “(Initial)” accidentally.
They’ve already got App Engine pointed squarely at the application problem, so it makes sense to initially roll out Compute Engine without worrying about that use case. However, to really compete with EC2, Google’ll have to take this next step. App Engine is a great product, but it’ll never provide the flexibility of access to a virtual machine — and that’s the exact need that causes many shops building user-facing apps to choose AWS. Google engineers are smarter than me, and they know this. Unless I’m way off base, It’s only a matter of time before Compute Engine steps in.
Here’s where things get interesting. Google’s datastore — the one already in use by App Engine, the one touted as the world’s largest public nosql datastore — is unique. Unless I’m wrong — and god, I’d love to be wrong — there is no datastore solution out there that comes close to its scalability + querying power with such shockingly few management concerns. To put it more precisely: what other datastore, database, nosql, sql, tape drive, or box of chalk can you drop into an application that’ll permanently persist terabytes of data with thousands of writes per second without forcing you to think long and hard about sharding your data across multiple machines? You might answer Amazon’s DynamoDB, which is becoming pretty cool and can handle all the writes, but the queries and indexes enabled by Google’s datastore are much, much more powerful. I’m sure Amazon is going to continue adding features to DynamoDB, but right now Google’s queries win easily.
Our App Engine datastore is currently using 7 terabytes of space. Before writing this blog, the last and only other time I needed to check that number was when I was creating a trivia game out of random Khan Academy facts.
If it’s not obvious by now, my pipe dream is for Google to combine Compute Engine’s flexible virtual machines with App Engine’s ridiculously impressive datastore. It’d need to be tight, automatic integration, in much the same way the current datastore only requires App Engine developers to include a Python library or two. It’d need to have latency comparable to App Engine’s current datastore latency, which presumably means you’d need a guarantee that your Compute Engine virtual machines have fast connections to all the right datastores.
I might be talking about science fiction. For all I know, there is some core issue that makes this type of integration a complete nonstarter. But if it ever exists, Amazon will have to play datastore catchup and many shops will have to take a good hard look at Compute Engine when preparing to build an app that needs flexible virtual servers and a powerful, scalable datastore with insultingly low maintenance requirements. We certainly would.