Cutting P95 latency by 60% on a Next.js + Postgres stack

P50 latency lies. Everyone’s app is fast on the median user with a warm cache and a clean session. P95 is where your business actually lives — the user with the slow connection, the dataset that grew, the route that does three sequential queries nobody profiled. Here’s how I took a real Next.js + Postgres app’s P95 from 2.6 seconds to 1.0 second over six weeks.

Where the wins actually hide

Counter-intuitively, the front-end work was 15% of the gain. The other 85% lived in three places: database query plans on cold-cached rows, sequential awaits where parallel ones would do, and middleware that ran on every request “just in case.” Real performance work is mostly auditing what’s actually happening, not adding cleverness.

The single most useful tool was server-timing headers wired up to client-side logging. Every response now reports its db time, cache time, render time, and total — and the slow ones get sampled to a dashboard. Without that signal, we were guessing. With it, every PR could show its impact.

Database is usually the long pole

On this app, three queries accounted for 60% of P95. Two of them were missing indexes that EXPLAIN ANALYZE found in five minutes. The third was a SELECT * FROM events WHERE user_id = ? ORDER BY created_at DESC LIMIT 50 that, for power users with millions of events, scanned a partition Postgres couldn’t prune. The fix was a partial index on (user_id, created_at DESC) filtered to the last 90 days — covers the hot path, leaves cold history alone.

A pattern: when a query is slow, look at the query plan, not the code. The fastest improvement is almost never “rewrite the JOIN” — it’s “add the index” or “rewrite the predicate so an existing index can be used.” Once you see this enough times, you stop guessing.

Edge caching without the gotchas

Next.js’s caching layers are powerful and confusing. The trap is enabling everything, then realizing your app serves stale data after a write. The discipline that worked:

Cache only routes that are deterministic functions of their URL (mostly read-only routes).
Set explicit cache tags, and revalidate by tag on the matching writes.
Routes that read user-specific data shouldn’t be cached at the edge at all — cache them per-user inside Redis with explicit invalidation.

None of this is novel. What’s novel is doing it deliberately, with measurement, instead of sprinkling cache headers and hoping. The 60% gain wasn’t one trick — it was a dozen small audits, each documented and verified.

If you want me to run this audit on your stack, I usually find the first 30% in week one. For a lower-level take on why the database tends to be the long pole, see designing APIs that survive a redesign — most slow APIs share a few common shapes.