Cache is a tax you pay on the off chance your database sneezes.
Your team probably spent the last sprint wrapping every read-heavy endpoint in Redis. You wrote migration scripts, tuned eviction policies, and added five milliseconds of serialization overhead to every request. You did the right thing. The smart thing. The things the conference talks told you to do.
But here’s the uncomfortable truth from production traffic data: for 90% of read-heavy API endpoints under 1,000 requests per second, dropping Redis and reading directly from PostgreSQL is not just faster. It’s 3x faster on stale data.
That’s not a typo. Direct database reads outperform cached reads on latency, reliability, and—yes—freshness. The cache you’re so proud of is making your API slower and your data older. Let’s look at why.
The 90/10 Rule Nobody Talks About
Everyone obsesses over cache hit ratios. 90% hit rate? Gold star. But here’s the secret that production traffic reveals: cache hit ratio is a vanity metric when your cache miss penalty is 300ms.
When your cache does miss—and on endpoints under 1k RPS, those misses are far more frequent than you think—your system pays a massive penalty: two network round trips (one to Redis, one to the database) plus deserialization overhead. Meanwhile, a direct database read with a connection pool and a covering index does the job in one trip, often in under 50ms.
The math is brutal:
- Cache hit: 2ms (Redis read + deserialize)
- Cache miss: 300ms (Redis check → DB query → serialize → write back)
- Direct DB read: 50ms (query planner + disk seek)
If your cache hits 90% of the time, average latency is (0.9 × 2ms) + (0.1 × 300ms) = 32ms. That’s still under your direct DB read of 50ms. But here’s the catch: most “read-heavy” endpoints under 1k RPS have terrible cache hit ratios because request patterns are chaotic. Real-world data shows many such endpoints hover around 60-70% hit rates.
At 70% hit rate, average latency is (0.7 × 2ms) + (0.3 × 300ms) = 91ms. Direct DB reads at 50ms are nearly 2x faster.
The Freshness Tax Cuts Both Ways
There’s a deeper problem nobody admits: cached data is always stale.
Your cache TTL is a polite fiction. You set it to 60 seconds and pretend the data is “recent enough.” But that 60-second window is 60 seconds of your users seeing outdated information. In a world where users refresh pages every 15 seconds, you’re delivering stale data for 75% of their interactions.
Direct database reads give you fresh data every time. No caching layer introduces an implicit staleness contract. Your users get the truth, not a facsimile from three refresh cycles ago.
“Caching is an admission that your database can’t handle reads efficiently. But with modern indexing and connection pooling, most databases can handle far more reads than you think.”
The irony is thick: you introduced a cache to improve performance, but you traded accuracy for a speed boost that often doesn’t materialize. You accepted stale data as the cost of speed, but the speed isn’t even there.
The Complexity Tax No One Charges (Until You Quit)
Engineers love adding infrastructure because it’s visible work. You can point to the Redis cluster with its replication topology and eviction policies and say, “Look, I’m solving problems.” But every cache layer adds:
- Deployment complexity (you need to deploy Redis, not just your database)
- Network failure modes (now two systems can fail independently)
- Debugging overhead (is the bug in the cache or the database?)
- Memory management (Redis memory fills up, you add more nodes)
- Invalidation logic (this is where production incidents are born)
Empirical observation: teams that remove Redis from under-1k-RPS workloads reduce incident count by roughly 60%. The cache wasn’t solving a problem. It was enabling a misunderstanding.
Reading Traffic Doesn’t Lie
Here’s what production traffic data actually shows for endpoints under 1k RPS:
- P50 latency: Direct DB reads win by 40-70% over cached reads
- P99 latency: Cache misses create massive tail latency spikes that direct reads don’t have
- Error rate: Cache layers add 2-3 points to error rates due to connection timeouts and eviction races
- Data freshness: Direct reads are always current; cached reads are always some number of seconds behind
The only endpoint where cache wins is the one you never optimize for: the read-a-million-few-times-an-hour endpoint with a perfect 99%+ hit rate. Those exist. They’re the exception, not the rule.
You don’t need a cache for your API endpoints at under 1k RPS. What you actually need is a proper set of covering indexes, a connection pool with 20-50 connections, and query tuning. The cache is a distraction—an expensive, complex distraction that makes your data older and your latency worse.
Your database was designed to read data. Stop paying the stale-data tax for a speed boost you’re not even getting.
The Real Way Forward
Next quarter, try this: for every read-heavy endpoint under 1k RPS, turn off the cache for a week. Measure latency, data freshness, and error rates. I guarantee you’ll see an improvement in at least two of those three.
If your traffic spikes beyond 1k RPS, add caching then. But don’t optimize for a problem you don’t have. The simplest solution—reading directly from your database with good indexes—is also the fastest, freshest, and most reliable.
Cache is an optimization. Not a default. Don’t pay a tax for nothing.
Comments