API Rate Limiting Patterns
Protect backend services from abuse and overload with rate limiting. Covers token bucket, sliding window, distributed rate limiting, client-specific limits, and the patterns that keep your API available under any traffic pattern.
Without rate limiting, a single misconfigured client, a viral blog post, or a malicious actor can overwhelm your API and bring down the service for everyone. Rate limiting is the circuit breaker between your API and the world — it ensures fair access, prevents abuse, and protects backend resources.
Algorithm Comparison
Fixed Window:
How: Count requests per fixed time window (e.g., per minute)
Pros: Simple, low memory
Cons: Burst at window boundary (double the limit)
Window: |──── minute 1 ────|──── minute 2 ────|
Limit: 100 requests/min
Problem: 100 requests at 0:59 + 100 at 1:00 = 200 in 2 seconds
Sliding Window Log:
How: Track timestamp of each request, count within window
Pros: Accurate, no boundary burst
Cons: High memory (store every timestamp)
Sliding Window Counter:
How: Weighted average of previous and current window
Pros: Accurate, low memory
Cons: Slightly approximate
Requests in window = (prev_count × overlap%) + current_count
If prev_window had 80 requests, 30% overlap, current has 20:
Effective count = (80 × 0.3) + 20 = 44
Token Bucket:
How: Bucket fills with tokens at constant rate, each request takes a token
Pros: Allows bursts up to bucket size, smooth average rate
Cons: Slightly more complex
Bucket size: 10 (allows burst of 10)
Refill rate: 1 token/second (10 req/10 sec average)
t=0: 10 tokens → send 5 requests → 5 tokens remaining
t=5: 10 tokens → bucket refilled
Implementation
import time
import redis
class SlidingWindowRateLimiter:
"""Distributed rate limiter using Redis sorted sets."""
def __init__(self, redis_client: redis.Redis):
self.redis = redis_client
def is_allowed(self, key: str, limit: int, window_seconds: int) -> dict:
"""Check if a request is allowed under the rate limit."""
now = time.time()
window_start = now - window_seconds
pipe = self.redis.pipeline()
# Remove expired entries
pipe.zremrangebyscore(key, 0, window_start)
# Count requests in current window
pipe.zcard(key)
# Add current request (optimistically)
pipe.zadd(key, {f"{now}:{id(key)}": now})
# Set TTL for cleanup
pipe.expire(key, window_seconds)
results = pipe.execute()
request_count = results[1]
allowed = request_count < limit
if not allowed:
# Remove the optimistic add
self.redis.zrem(key, f"{now}:{id(key)}")
return {
"allowed": allowed,
"remaining": max(0, limit - request_count - 1),
"reset": int(now + window_seconds),
"limit": limit,
}
# Usage:
limiter = SlidingWindowRateLimiter(redis.Redis())
# Rate limit per API key: 100 requests per minute
result = limiter.is_allowed(
key=f"ratelimit:{api_key}",
limit=100,
window_seconds=60,
)
if not result["allowed"]:
# Return 429 Too Many Requests
headers = {
"X-RateLimit-Limit": str(result["limit"]),
"X-RateLimit-Remaining": str(result["remaining"]),
"X-RateLimit-Reset": str(result["reset"]),
"Retry-After": str(result["reset"] - int(time.time())),
}
Response Headers
Standard rate limit headers (RFC 6585 + draft-ietf-httpapi-ratelimit-headers):
HTTP/1.1 429 Too Many Requests
Content-Type: application/json
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1710432000
Retry-After: 42
{
"error": "rate_limit_exceeded",
"message": "Rate limit of 100 requests per minute exceeded",
"retry_after": 42
}
Always include:
X-RateLimit-Limit: Total allowed requests per window
X-RateLimit-Remaining: Requests remaining in current window
X-RateLimit-Reset: Unix timestamp when window resets
Retry-After: Seconds until client can retry
Anti-Patterns
| Anti-Pattern | Consequence | Fix |
|---|---|---|
| No rate limiting at all | Single client can DoS your API | Rate limit all endpoints |
| Same limit for all clients | Enterprise customers hit limits, abusers get same quota | Tiered limits by plan/API key |
| Rate limit only at API gateway | Internal services can still overload each other | Rate limit at service level too |
| No response headers | Clients cannot self-regulate | Always return rate limit headers |
| Hard reject without retry info | Clients hammer the API with retries | Return Retry-After header with backoff |
Rate limiting is not about saying no — it is about ensuring fair access. Good rate limiting protects your service while giving clients clear feedback on their usage and when they can retry.