How I Slashed API Latency by 40% Using Redis Caching: Beyond the GET/SET Basics
When building software that scales, developers quickly realize that databases are almost always the bottleneck. In a recent project I worked on, complex dashboard queries involving nested user metadata and catalog joins were taking anywhere from 600ms to 1.2 seconds to resolve. For a modern web app, this latency is unacceptable. By introducing a production-grade Redis caching layer, we cut overall API latency by over 40% while shielding our primary databases from redundant load.
The Latency Bottleneck: Anatomy of a Slow Request
The API endpoint in question served the primary user dashboard. To render the screen, the backend had to pull data across three separate collections: user profiles, active subscriptions, and catalog metrics. As the database grew, index lookup times increased, and join operations became computationally expensive. Instead of running these queries on every refresh, the solution was clear: Cache the computed query result.
Implementing Cache-Aside (Lazy Loading)
We implemented the classic Cache-Aside pattern. When a request hits the server: 1. Check if the computed result exists in Redis (Cache Hit). If yes, return it instantly. 2. If it's a Cache Miss, query the database, write the result back to Redis with a Time-To-Live (TTL) configuration, and return the response.
import Redis from "ioredis";
const redis = new Redis(process.env.REDIS_URL || "redis://localhost:6379");
export async function getOrSetCache<T>(
key: string,
fetchFn: () => Promise<T>,
expirySeconds: number = 3600
): Promise<T> {
const cachedData = await redis.get(key);
if (cachedData) {
return JSON.parse(cachedData) as T;
}
const freshData = await fetchFn();
await redis.set(
key,
JSON.stringify(freshData),
"EX",
expirySeconds
);
return freshData;
}
Advanced Caching Challenges We Solved
Writing a cache wrapper is simple, but running it in production under massive traffic exposes several critical bugs:
1. Avoiding Cache stampedes (Thundering Herd)
If a popular cache key expires, and 1,000 requests arrive at the exact same millisecond, all of them will find a cache miss and execute the database query simultaneously, crashing the database. We solved this by implementing distributed locks (using Redlock or local mutexes) so only the first request queries the DB while others wait for the cache to re-populate.
2. Active Invalidation vs. Passive TTL
Relying solely on TTL means users might see outdated profiles for minutes. We implemented active cache invalidation: whenever a user updates their profile, we execute a database transaction AND invalidate the cached key in a background worker, ensuring data remains fresh.
Results & Takeaways
Adding this layer reduced database load by 65%, cut our average response times from 420ms down to 80ms (an average 80% decrease for cached endpoints, bringing the total app average latency down by 40%), and increased our server's maximum throughput capacity. If you're building systems that scale, caching is your most powerful tool—when designed with robust invalidation policies.
Thanks for reading.