TIL - What Response Time Metrics Really Mean

#software #monitoring #performance

I always thought high percentiles didn't really matter, they only impact a small number of users, right? I interpreted them as the worst case (something unlikely to affect most users).

This week I read DDIA and came across the part describing how Amazon sets their response time requirements at p99.9. That means the requirement is based on 1 in 1000 users :). But the reason is something I never thought of: the users in the high percentiles are most likely the ones with the most data, which makes them important users for Amazon.

I reflected on this with my experience working on an Ads Platform. Some processes were slow and it was almost always the same small group of users with many ads, which I assume also correlates with ads revenue contribution. I wonder if we had designed the system around those high-percentile users, maybe we could have made the platform better for all users, and best for our most important sellers.

Response time isn't the same as latency

I don't know why, but somehow I just know that response time and latency are different:

Service time: how long the server actually spends processing the request.
Latency: time the request spends waiting (queued, in transit, blocked).
Response time: what the caller sees: service time + network + queueing + everything else.

Response time is from the caller's perspective. Service time is from the callee's. They're almost never equal. I think this is important because most of us only track one side.

Average hides the shape

Response times aren't a single number, they're a distribution. Most requests are fast, a few are very slow, and the average sits somewhere awkward between them.

Average doesn't tell us how many users actually experienced the delay. An average of 200ms can mean everyone gets ~200ms, or that most get 50ms while a few get 2 seconds. The average doesn't tell us which one we have.

That's why averages aren't enough. We need a metric that respects the shape.

Percentiles, properly

A percentile shows "what response time were X% of requests faster than?"

p50: half were faster.
p95: 95% were faster, 5% were slower.
p99: 99% were faster, 1% were slower.
p99.9: 99.9% were faster, 0.1% were slower.

If p99 = 500ms, it means 1 out of every 100 requests took longer than 500ms. That's the part I used to dismiss as noise.

Which percentile to chase

Once we accept the tail matters, the next question is how far in?

Honestly, I don't know how to answer the question. Maybe the choice isn't really technical and it's a business question: which users have we decided to serve well? p99 means we're serving 99% of requests well. p99.9 means we're including heavy users (the ones who, going back to the Amazon insight, probably matter most).

A few things I wish I'd known earlier

Measure at both caller and callee. Callee might report p99 = 50ms while caller sees p99 = 300ms for the same calls. The 250ms gap is in the network, the connection pool, queueing, or the caller's own thread pool. If we only look at one side, we miss it.
Timeouts decouple the metrics. If the caller times out at 200ms and the callee takes 500ms, the callee's dashboard shows a successful 500ms response to a request the caller already gave up on. Both metrics are technically correct but are misleading on their own.
Don't average percentiles across servers. This is my second confession. For years, when our dashboard showed p99 from multiple servers, I'd take the average of those numbers and call it our "global p99." That's mathematically meaningless. The average of ten p99s is not the p99 of the combined population. The right way is to merge the underlying histograms first, then compute the percentile.

Takeaway

A metric isn't just a number. It's a statement about which users we've decided to serve well.

An average says "I care about the typical user." p99 says "I care about almost everyone." p99.9 says "I care about the heavy users too, the ones who probably matter most to the business."

For years, I was implicitly choosing the first one without realizing I was choosing anything.