API Rate Limiting & Throttling
Protect APIs with rate limiting. Covers token bucket, sliding window, distributed rate limiting with Redis, client-specific limits, and graceful degradation under load.
Rate limiting prevents any single client from monopolizing your API. Without it, one misbehaving script can consume all your server capacity, effectively DDoS-ing your own service for everyone else. Good rate limiting is invisible to normal users and transparent to developers who hit limits.
Algorithms
| Algorithm | How It Works | Pros | Cons |
|---|---|---|---|
| Token bucket | Tokens added at fixed rate, consumed per request | Allows bursts, smooth | Slightly complex |
| Sliding window | Count requests in rolling time window | Accurate, predictable | Memory per window |
| Fixed window | Count requests in fixed intervals (per minute) | Simple | Burst at window boundaries |
| Leaky bucket | Requests queued, processed at fixed rate | Smooth output | Queuing adds latency |
Implementation with Redis
import redis
import time
r = redis.Redis()
def rate_limit(client_id, max_requests=100, window_seconds=60):
"""Sliding window rate limiter with Redis."""
key = f"ratelimit:{client_id}"
now = time.time()
window_start = now - window_seconds
pipe = r.pipeline()
# Remove old entries outside the window
pipe.zremrangebyscore(key, 0, window_start)
# Count requests in current window
pipe.zcard(key)
# Add current request
pipe.zadd(key, {str(now): now})
# Set expiry on the key
pipe.expire(key, window_seconds)
results = pipe.execute()
request_count = results[1]
if request_count >= max_requests:
return {
"allowed": False,
"retry_after": window_seconds - (now - window_start),
"limit": max_requests,
"remaining": 0,
}
return {
"allowed": True,
"limit": max_requests,
"remaining": max_requests - request_count - 1,
}
Response Headers
HTTP/1.1 200 OK
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 47
X-RateLimit-Reset: 1709510400
HTTP/1.1 429 Too Many Requests
Retry-After: 30
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1709510400
Content-Type: application/json
{
"error": "rate_limit_exceeded",
"message": "Rate limit of 100 requests per minute exceeded",
"retry_after_seconds": 30
}
Tiered Rate Limits
| Tier | Requests/min | Burst | Use Case |
|---|---|---|---|
| Free | 60 | 10 | Public API, evaluation |
| Basic | 600 | 100 | Small applications |
| Pro | 6,000 | 1,000 | Production workloads |
| Enterprise | 60,000 | 10,000 | High-volume integrations |
| Internal | Unlimited | N/A | Internal services |
Anti-Patterns
| Anti-Pattern | Problem | Fix |
|---|---|---|
| No rate limiting | One client can exhaust resources | Rate limit all endpoints |
| Same limit for all endpoints | Expensive endpoints treated same as cheap | Per-endpoint limits based on cost |
| No burst allowance | Legitimate short bursts blocked | Token bucket with burst capacity |
| Vague 429 responses | Clients don’t know when to retry | Include Retry-After header and remaining count |
| Rate limit by IP only | Shared IPs (offices, NATs) hit limits | Rate limit by API key or user ID |
Checklist
- Rate limiting on all public API endpoints
- Algorithm: token bucket or sliding window
- Distributed: Redis-backed for multi-instance APIs
- Tiered limits by plan/subscription level
- Response headers: Limit, Remaining, Reset
- 429 response with Retry-After and clear error message
- Per-endpoint limits for expensive operations
- Monitoring: rate limit hits, top consumers
- Graceful degradation: degrade non-critical features first
:::note[Source] This guide is derived from operational intelligence at Garnet Grid Consulting. For API design consulting, visit garnetgrid.com. :::