Sometimes it feels like we’re playing a rousing of guess’n’check when adjusting our App Engine performance settings. The request scheduler is a bit of a black box, so it can be hard to know which knobs we should twist and how far.
But there are two really straightforward signs any App Engine dev should watch out for. Both are a big red flag signaling a configuration not optimized for performance:
You can identify whether or not your requests are wasting time in the pending queue by looking at your request logs and spotting high
pending_ms values. Here’s an example:
220.127.116.11 - - [16/Jan/2013:16:03:12 -0800]
"POST /api/v1/user/exercises/distributive_property/problems/1/attempt HTTP/1.1"
200 2558 - "Mozilla/5.0 (Windows NT 5.1; rv:17.0) Gecko/20100101 Firefox/17.0"
"www.khanacademy.org" ms=877 cpu_ms=649 cpm_usd=0.000000 pending_ms=195
If you see lots of that, it’s probably a sign that you don’t have enough instances running to serve your traffic. The symptom is that requests are getting queued up behind other requests, waiting for other requests to finish before even starting to run your code. Hence the
The causes are trickier to spot. If you think you’ve already optimized your number of idle instances and pending latency settings you may want to focus on making your loading requests faster. If App Engine thinks it takes a really long time for a new instance to load your app’s code on the first request, it may be less willing to give you that new instance when a request comes in. Have fun tapping your toe while
pending_ms ticks up. You should also be aware of any requests that are abnormally slow and consider moving them to backends or specific task queues. Otherwise App Engine might queue up some normal request like a user loading your homepage (“I swear, it’s usually so fast!”) right behind some long-running bruiser of an API call. And all of a sudden your user is waiting an extra 500 millis just to get out of the pending queue before your homepage can even start to be served.
Loading requests can be an equally bad sign. You want your instances to stay up and stay healthy for as long as possible. They should be serving thousands and thousands of requests before needing to be recycled. Otherwise your users just sit around on their butts waiting for App Engine instances to reload all that beautiful code you’ve spent so much time writing.
These requests are spotted via
loading_request=1 in your request logs. They’re going to be slow, so you don’t want to see ‘em much.
If you’re seeing an abnormally high amount of
loading_request=1's given how much traffic you serve, chances are that you've also been noticing bigger problems. This could be a sign that your requests are timing out or simply crashing when a new instance starts up. This'll cause App Engine to kill the instance being loaded, which causes another loading request to be fired for the next request, which…well, you get the point. You might also have memory leaks that are causing instances to get shut down after serving just a few requests. Bottom line is something is causing your instances to thrash off and on too often.
As of today, you can spot both of these problems when browsing around your site if you’re using the App Engine Mini Profiler. If you’re wondering why a request is slow, open up the profiler and take a gander:
You should, of course, still keep track of aggregate
loading_request appearances via your request logs. The mini profiler just makes it easy for you to take a glance at the current request’s health.