Rate Limiting and Throttling Patterns
Implement rate limiting that protects your APIs from abuse without degrading legitimate traffic. Covers token bucket, sliding window, distributed rate limiting, client-specific quotas, and graceful degradation strategies.
Rate limiting protects your system from being overwhelmed — whether by a misbehaving client, a DDoS attack, or your own frontend making too many requests. Without rate limiting, a single aggressive client can consume resources meant for thousands of users.
Algorithms
Token Bucket
A bucket fills with tokens at a steady rate. Each request consumes one token. When the bucket is empty, requests are rejected:
class TokenBucket:
def __init__(self, rate, capacity):
self.rate = rate # tokens per second
self.capacity = capacity # max burst size
self.tokens = capacity
self.last_refill = time.time()
def allow(self):
now = time.time()
elapsed = now - self.last_refill
self.tokens = min(self.capacity, self.tokens + elapsed * self.rate)
self.last_refill = now
if self.tokens >= 1:
self.tokens -= 1
return True
return False
Best for: APIs that allow short bursts but enforce sustained rate limits.
Sliding Window
Count requests in a rolling time window:
class SlidingWindow:
def __init__(self, limit, window_seconds):
self.limit = limit
self.window = window_seconds
def allow(self, client_id):
now = time.time()
key = f"rate:{client_id}"
# Remove expired entries
redis.zremrangebyscore(key, 0, now - self.window)
# Count current window
count = redis.zcard(key)
if count < self.limit:
redis.zadd(key, {str(now): now})
redis.expire(key, self.window)
return True
return False
Best for: Strict per-window limits without burst allowance.
Fixed Window
Count requests in fixed time buckets (e.g., per minute):
def allow(client_id):
window = int(time.time() / 60) # Current minute
key = f"rate:{client_id}:{window}"
count = redis.incr(key)
redis.expire(key, 60)
return count <= LIMIT
Best for: Simple implementation, acceptable boundary-crossing inaccuracy.
Distributed Rate Limiting
Single-instance rate limiting fails with multiple API servers. Solutions:
Redis-Based (Centralized)
# All API servers check the same Redis counter
def rate_limit(client_id, limit=100, window=60):
key = f"rate:{client_id}:{int(time.time() / window)}"
current = redis.incr(key)
if current == 1:
redis.expire(key, window)
return current <= limit
Local + Sync (Approximate)
Each server maintains local counters, periodically syncing with a central store. Allows some over-limit requests but avoids Redis latency on every request.
Response Headers
Communicate rate limit status to clients:
HTTP/1.1 200 OK
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 67
X-RateLimit-Reset: 1709567400
Retry-After: 30
When rate limited:
HTTP/1.1 429 Too Many Requests
Retry-After: 30
Content-Type: application/json
{
"error": "rate_limit_exceeded",
"message": "Rate limit of 100 requests per minute exceeded",
"retry_after": 30
}
Tiered Rate Limits
Different limits for different client tiers:
rate_limits:
free:
requests_per_minute: 60
requests_per_day: 1000
pro:
requests_per_minute: 600
requests_per_day: 50000
enterprise:
requests_per_minute: 6000
requests_per_day: unlimited
Anti-Patterns
| Anti-Pattern | Consequence | Fix |
|---|---|---|
| No rate limiting | Single client can DoS your API | Implement rate limiting on day one |
| Same limit for all endpoints | Expensive endpoints unprotected | Per-endpoint limits based on cost |
| Hard reject without Retry-After | Clients retry immediately, making it worse | Always include Retry-After header |
| Rate limit by IP only | Shared IPs (NAT) punish legitimate users | Rate limit by API key + IP |
| No monitoring of rate limit hits | Cannot distinguish abuse from growth | Dashboard and alerts on 429 rates |
Rate limiting is both a security measure and a product feature. Done well, it protects your infrastructure while giving paying customers the throughput they need.