Rate Limiting and Throttling Patterns

Rate limiting protects your system from being overwhelmed — whether by a misbehaving client, a DDoS attack, or your own frontend making too many requests. Without rate limiting, a single aggressive client can consume resources meant for thousands of users.

Algorithms

Token Bucket

A bucket fills with tokens at a steady rate. Each request consumes one token. When the bucket is empty, requests are rejected:

class TokenBucket:
    def __init__(self, rate, capacity):
        self.rate = rate          # tokens per second
        self.capacity = capacity  # max burst size
        self.tokens = capacity
        self.last_refill = time.time()
    
    def allow(self):
        now = time.time()
        elapsed = now - self.last_refill
        self.tokens = min(self.capacity, self.tokens + elapsed * self.rate)
        self.last_refill = now
        
        if self.tokens >= 1:
            self.tokens -= 1
            return True
        return False

Best for: APIs that allow short bursts but enforce sustained rate limits.

Sliding Window

Count requests in a rolling time window:

class SlidingWindow:
    def __init__(self, limit, window_seconds):
        self.limit = limit
        self.window = window_seconds
    
    def allow(self, client_id):
        now = time.time()
        key = f"rate:{client_id}"
        
        # Remove expired entries
        redis.zremrangebyscore(key, 0, now - self.window)
        
        # Count current window
        count = redis.zcard(key)
        
        if count < self.limit:
            redis.zadd(key, {str(now): now})
            redis.expire(key, self.window)
            return True
        return False

Best for: Strict per-window limits without burst allowance.

Fixed Window

Count requests in fixed time buckets (e.g., per minute):

def allow(client_id):
    window = int(time.time() / 60)  # Current minute
    key = f"rate:{client_id}:{window}"
    count = redis.incr(key)
    redis.expire(key, 60)
    return count <= LIMIT

Best for: Simple implementation, acceptable boundary-crossing inaccuracy.

Distributed Rate Limiting

Single-instance rate limiting fails with multiple API servers. Solutions:

Redis-Based (Centralized)

# All API servers check the same Redis counter
def rate_limit(client_id, limit=100, window=60):
    key = f"rate:{client_id}:{int(time.time() / window)}"
    current = redis.incr(key)
    if current == 1:
        redis.expire(key, window)
    return current <= limit

Local + Sync (Approximate)

Each server maintains local counters, periodically syncing with a central store. Allows some over-limit requests but avoids Redis latency on every request.

Response Headers

Communicate rate limit status to clients:

HTTP/1.1 200 OK
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 67
X-RateLimit-Reset: 1709567400
Retry-After: 30

When rate limited:

HTTP/1.1 429 Too Many Requests
Retry-After: 30
Content-Type: application/json

{
  "error": "rate_limit_exceeded",
  "message": "Rate limit of 100 requests per minute exceeded",
  "retry_after": 30
}

Tiered Rate Limits

Different limits for different client tiers:

rate_limits:
  free:
    requests_per_minute: 60
    requests_per_day: 1000
  pro:
    requests_per_minute: 600
    requests_per_day: 50000
  enterprise:
    requests_per_minute: 6000
    requests_per_day: unlimited

Anti-Patterns

Anti-Pattern	Consequence	Fix
No rate limiting	Single client can DoS your API	Implement rate limiting on day one
Same limit for all endpoints	Expensive endpoints unprotected	Per-endpoint limits based on cost
Hard reject without Retry-After	Clients retry immediately, making it worse	Always include Retry-After header
Rate limit by IP only	Shared IPs (NAT) punish legitimate users	Rate limit by API key + IP
No monitoring of rate limit hits	Cannot distinguish abuse from growth	Dashboard and alerts on 429 rates

Rate limiting is both a security measure and a product feature. Done well, it protects your infrastructure while giving paying customers the throughput they need.