Blog Post

Engineering insights from the DevStack team.

Home Blog Post
Engineering

How We Reduced API Latency by 60%

David Kim David Kim March 12, 2026 5 min read
Code visualization representing API optimization

When we started seeing p99 response times creep above 800ms on our core API endpoints, we knew it was time for a fundamental rethink of our architecture. After three months of careful profiling, experimentation, and incremental rollout, we managed to bring that number down to 320ms — a 60% reduction.

The Problem: Death by a Thousand Queries

Our original architecture was straightforward: every API request hit our primary PostgreSQL database, ran a handful of queries, serialized the result, and returned it. Simple and easy to reason about. But as our customer base grew from 500 to 2,500 companies, the cracks started showing.

The biggest bottleneck wasn't any single slow query — it was the sheer number of queries per request. A typical dashboard fetch would trigger 12–15 separate database calls, each individually fast but collectively adding up to hundreds of milliseconds.

"The fastest database query is the one you never make." — Our VP Engineering's new favorite saying.

The Solution: Three-Layered Caching

We implemented a three-layered caching strategy that dramatically reduced database load while keeping data fresh enough for real-time analytics. The first layer handles request-level caching, the second manages shared application caches with Redis, and the third uses edge caching for static and semi-static responses.

Results and Lessons Learned

After rolling this out gradually over six weeks, we saw p99 latency drop from 820ms to 320ms, database CPU utilization decrease by 45%, and API throughput increase by 3x without adding any new hardware. The key lesson: measure first, optimize second. Every change we made was driven by profiling data, not guesswork.