Caching Strategies: Serving Data at the Speed of Memory
Implement caching that reduces latency from hundreds of milliseconds to single-digit milliseconds. Covers cache placement, invalidation strategies, cache-aside vs write-through patterns, distributed caching with Redis, CDN caching, and the pitfalls that turn your cache into a source of stale data and subtle bugs.
Caching is the most powerful performance optimization available to you — and the most dangerous. A well-placed cache reduces database queries by 90%, cuts response times from 200ms to 5ms, and handles traffic spikes that would otherwise overwhelm your backend. A poorly implemented cache serves stale data, creates consistency bugs that are nearly impossible to reproduce, and adds a new failure mode to every request path.
The hard part of caching is not storing data. It is invalidating it.
Cache Placement
Where to cache (from closest to user to closest to data):
User CDN App Server Cache Database
│ │ │ │ │
│── Request ──→│ │ │ │
│ │── Cache hit? │ │ │
│ │ Yes → 200 │ │ │
│ │ No ────────→── Cache hit? │ │
│ │ │ Yes → data │ │
│ │ │ No ────────→── Query ────→│
│ │ │ │←── Result ───│
│ │ │←── Store+Return │
│←── Response ─│←── store ───│ │ │
Latency at each layer:
CDN cache hit: 1-20ms (edge, near user)
App cache hit: 1-5ms (Redis/Memcached, in datacenter)
Database query: 10-500ms (disk I/O, query execution)
| Cache Location | Latency | Best For |
|---|---|---|
| Browser cache | 0ms | Static assets, user-specific data |
| CDN | 1-20ms | Static content, public API responses |
| Application cache (Redis) | 1-5ms | Database query results, computed values |
| Database cache | 5-20ms | Query plan cache, buffer pool |
Caching Patterns
Cache-Aside (Lazy Loading)
# Application manages cache reads and writes
def get_user(user_id: str) -> User:
# 1. Check cache first
cached = redis.get(f"user:{user_id}")
if cached:
return User.from_json(cached)
# 2. Cache miss: query database
user = db.query("SELECT * FROM users WHERE id = %s", user_id)
# 3. Store in cache for next time
redis.setex(f"user:{user_id}", 3600, user.to_json()) # TTL: 1 hour
return user
def update_user(user_id: str, data: dict):
# Update database
db.execute("UPDATE users SET ... WHERE id = %s", user_id)
# Invalidate cache (delete, not update)
redis.delete(f"user:{user_id}")
# Next read will populate cache with fresh data
Write-Through
# Cache is always updated synchronously with database
def update_user(user_id: str, data: dict):
# Update database
db.execute("UPDATE users SET ... WHERE id = %s", user_id)
# Update cache immediately
user = db.query("SELECT * FROM users WHERE id = %s", user_id)
redis.setex(f"user:{user_id}", 3600, user.to_json())
# Cache is always consistent with database
# Higher write latency (cache + DB on every write)
Pattern Comparison
| Pattern | Read Speed | Write Speed | Consistency | Complexity |
|---|---|---|---|---|
| Cache-aside | Fast (after first miss) | Fast (invalidate only) | Eventual | Low |
| Write-through | Fast | Slower (cache + DB) | Strong | Medium |
| Write-behind | Fast | Fast (async write) | Eventual | High |
| Read-through | Fast | N/A | Eventual | Medium |
Cache Invalidation
The two hard problems in computer science:
1. Cache invalidation
2. Naming things
3. Off-by-one errors
Invalidation strategies:
TTL (Time-to-Live):
"Data is valid for 1 hour, then stale"
Simple, but data can be stale for up to TTL duration
Event-driven:
"When data changes, delete the cache entry"
Accurate, but requires events for every write path
Versioned keys:
"Cache key includes version: user:123:v5"
New version = new key = automatic invalidation
Purge on deploy:
"Flush relevant caches when deploying new code"
Useful for config/schema cache, not for data
Redis Best Practices
# Key naming convention
# {entity}:{id}:{optional_field}
"user:12345" # Full user object
"user:12345:preferences" # Just preferences
"product:sku_abc:price" # Product price
"session:token_xyz" # Session data
"rate_limit:user:12345" # Rate limiting counter
# TTL guidelines:
# User profile: 1 hour (changes infrequently)
# Product listing: 15 min (prices may change)
# Session data: 24 hours (or until logout)
# Rate limit: 1 minute (sliding window)
# Feature flags: 5 minutes (allow quick changes)
Cache Stampede Prevention
# Problem: cache expires → 100 concurrent requests all query DB simultaneously
# Solution: lock-based refresh
def get_user_safe(user_id: str) -> User:
cached = redis.get(f"user:{user_id}")
if cached:
return User.from_json(cached)
# Try to acquire lock (only one request refreshes cache)
lock = redis.set(f"lock:user:{user_id}", "1", nx=True, ex=10)
if lock:
# Won the lock: refresh cache
user = db.query("SELECT * FROM users WHERE id = %s", user_id)
redis.setex(f"user:{user_id}", 3600, user.to_json())
redis.delete(f"lock:user:{user_id}")
return user
else:
# Another request is refreshing: wait briefly and retry
time.sleep(0.1)
return get_user_safe(user_id)
Anti-Patterns
| Anti-Pattern | Problem | Fix |
|---|---|---|
| Cache everything | Memory bloat, stale data | Cache hot data only (80/20 rule) |
| No TTL | Data never expires, grows forever | Always set TTL, even if long |
| Update cache on write | Race conditions between write + cache | Delete cache on write (cache-aside) |
| Cache without monitoring | Silent failures, stale data | Track hit rate, miss rate, evictions |
| Caching error responses | Error cached → repeated failures | Never cache 5xx or error states |
Implementation Checklist
- Identify the top 5 slowest queries and cache their results
- Use cache-aside pattern as default (read cache, miss → query → store)
- Invalidate by deleting cache keys on write, not by updating
- Set TTL on every cache entry — never cache without expiration
- Use consistent key naming:
{entity}:{id}:{field} - Prevent cache stampede with distributed locks on refresh
- Monitor cache hit rate (target > 90% for hot data)
- Never cache error responses or null results
- Use CDN caching for static assets and public API responses
- Load test with cold cache to ensure the application handles cache misses gracefully