Caching Strategies: Serving Data at the Speed of Memory

Caching is the most powerful performance optimization available to you — and the most dangerous. A well-placed cache reduces database queries by 90%, cuts response times from 200ms to 5ms, and handles traffic spikes that would otherwise overwhelm your backend. A poorly implemented cache serves stale data, creates consistency bugs that are nearly impossible to reproduce, and adds a new failure mode to every request path.

The hard part of caching is not storing data. It is invalidating it.

Cache Placement

Where to cache (from closest to user to closest to data):

  User           CDN          App Server      Cache         Database
   │              │              │              │              │
   │── Request ──→│              │              │              │
   │              │── Cache hit? │              │              │
   │              │   Yes → 200  │              │              │
   │              │   No ────────→── Cache hit? │              │
   │              │              │   Yes → data │              │
   │              │              │   No ────────→── Query ────→│
   │              │              │              │←── Result ───│
   │              │              │←── Store+Return             │
   │←── Response ─│←── store ───│              │              │

  Latency at each layer:
  CDN cache hit:     1-20ms    (edge, near user)
  App cache hit:     1-5ms     (Redis/Memcached, in datacenter)
  Database query:    10-500ms  (disk I/O, query execution)

Cache Location	Latency	Best For
Browser cache	0ms	Static assets, user-specific data
CDN	1-20ms	Static content, public API responses
Application cache (Redis)	1-5ms	Database query results, computed values
Database cache	5-20ms	Query plan cache, buffer pool

Caching Patterns

Cache-Aside (Lazy Loading)

# Application manages cache reads and writes
def get_user(user_id: str) -> User:
    # 1. Check cache first
    cached = redis.get(f"user:{user_id}")
    if cached:
        return User.from_json(cached)

    # 2. Cache miss: query database
    user = db.query("SELECT * FROM users WHERE id = %s", user_id)

    # 3. Store in cache for next time
    redis.setex(f"user:{user_id}", 3600, user.to_json())  # TTL: 1 hour

    return user

def update_user(user_id: str, data: dict):
    # Update database
    db.execute("UPDATE users SET ... WHERE id = %s", user_id)

    # Invalidate cache (delete, not update)
    redis.delete(f"user:{user_id}")
    # Next read will populate cache with fresh data

Write-Through

# Cache is always updated synchronously with database
def update_user(user_id: str, data: dict):
    # Update database
    db.execute("UPDATE users SET ... WHERE id = %s", user_id)

    # Update cache immediately
    user = db.query("SELECT * FROM users WHERE id = %s", user_id)
    redis.setex(f"user:{user_id}", 3600, user.to_json())

    # Cache is always consistent with database
    # Higher write latency (cache + DB on every write)

Pattern Comparison

Pattern	Read Speed	Write Speed	Consistency	Complexity
Cache-aside	Fast (after first miss)	Fast (invalidate only)	Eventual	Low
Write-through	Fast	Slower (cache + DB)	Strong	Medium
Write-behind	Fast	Fast (async write)	Eventual	High
Read-through	Fast	N/A	Eventual	Medium

Cache Invalidation

The two hard problems in computer science:
  1. Cache invalidation
  2. Naming things
  3. Off-by-one errors

Invalidation strategies:

  TTL (Time-to-Live):
    "Data is valid for 1 hour, then stale"
    Simple, but data can be stale for up to TTL duration

  Event-driven:
    "When data changes, delete the cache entry"
    Accurate, but requires events for every write path

  Versioned keys:
    "Cache key includes version: user:123:v5"
    New version = new key = automatic invalidation

  Purge on deploy:
    "Flush relevant caches when deploying new code"
    Useful for config/schema cache, not for data

Redis Best Practices

# Key naming convention
# {entity}:{id}:{optional_field}
"user:12345"                  # Full user object
"user:12345:preferences"      # Just preferences
"product:sku_abc:price"       # Product price
"session:token_xyz"           # Session data
"rate_limit:user:12345"       # Rate limiting counter

# TTL guidelines:
# User profile:     1 hour    (changes infrequently)
# Product listing:  15 min    (prices may change)
# Session data:     24 hours  (or until logout)
# Rate limit:       1 minute  (sliding window)
# Feature flags:    5 minutes (allow quick changes)

Cache Stampede Prevention

# Problem: cache expires → 100 concurrent requests all query DB simultaneously
# Solution: lock-based refresh

def get_user_safe(user_id: str) -> User:
    cached = redis.get(f"user:{user_id}")
    if cached:
        return User.from_json(cached)

    # Try to acquire lock (only one request refreshes cache)
    lock = redis.set(f"lock:user:{user_id}", "1", nx=True, ex=10)
    if lock:
        # Won the lock: refresh cache
        user = db.query("SELECT * FROM users WHERE id = %s", user_id)
        redis.setex(f"user:{user_id}", 3600, user.to_json())
        redis.delete(f"lock:user:{user_id}")
        return user
    else:
        # Another request is refreshing: wait briefly and retry
        time.sleep(0.1)
        return get_user_safe(user_id)

Anti-Patterns

Anti-Pattern	Problem	Fix
Cache everything	Memory bloat, stale data	Cache hot data only (80/20 rule)
No TTL	Data never expires, grows forever	Always set TTL, even if long
Update cache on write	Race conditions between write + cache	Delete cache on write (cache-aside)
Cache without monitoring	Silent failures, stale data	Track hit rate, miss rate, evictions
Caching error responses	Error cached → repeated failures	Never cache 5xx or error states

Cache Placement

Caching Patterns

Cache-Aside (Lazy Loading)

Write-Through

Pattern Comparison

Cache Invalidation

Redis Best Practices

Cache Stampede Prevention

Anti-Patterns

Implementation Checklist

More in Backend Engineering

API Gateway Patterns: The Front Door to Your Microservices

API Versioning Strategies and Patterns

API Versioning Strategies