ESC
Type to search guides, tutorials, and reference documentation.
Verified by Garnet Grid

Rate Limiting and Throttling Patterns

Implement rate limiting that protects your APIs from abuse without degrading legitimate traffic. Covers token bucket, sliding window, distributed rate limiting, client-specific quotas, and graceful degradation strategies.

Rate limiting protects your system from being overwhelmed — whether by a misbehaving client, a DDoS attack, or your own frontend making too many requests. Without rate limiting, a single aggressive client can consume resources meant for thousands of users.


Algorithms

Token Bucket

A bucket fills with tokens at a steady rate. Each request consumes one token. When the bucket is empty, requests are rejected:

class TokenBucket:
    def __init__(self, rate, capacity):
        self.rate = rate          # tokens per second
        self.capacity = capacity  # max burst size
        self.tokens = capacity
        self.last_refill = time.time()
    
    def allow(self):
        now = time.time()
        elapsed = now - self.last_refill
        self.tokens = min(self.capacity, self.tokens + elapsed * self.rate)
        self.last_refill = now
        
        if self.tokens >= 1:
            self.tokens -= 1
            return True
        return False

Best for: APIs that allow short bursts but enforce sustained rate limits.

Sliding Window

Count requests in a rolling time window:

class SlidingWindow:
    def __init__(self, limit, window_seconds):
        self.limit = limit
        self.window = window_seconds
    
    def allow(self, client_id):
        now = time.time()
        key = f"rate:{client_id}"
        
        # Remove expired entries
        redis.zremrangebyscore(key, 0, now - self.window)
        
        # Count current window
        count = redis.zcard(key)
        
        if count < self.limit:
            redis.zadd(key, {str(now): now})
            redis.expire(key, self.window)
            return True
        return False

Best for: Strict per-window limits without burst allowance.

Fixed Window

Count requests in fixed time buckets (e.g., per minute):

def allow(client_id):
    window = int(time.time() / 60)  # Current minute
    key = f"rate:{client_id}:{window}"
    count = redis.incr(key)
    redis.expire(key, 60)
    return count <= LIMIT

Best for: Simple implementation, acceptable boundary-crossing inaccuracy.


Distributed Rate Limiting

Single-instance rate limiting fails with multiple API servers. Solutions:

Redis-Based (Centralized)

# All API servers check the same Redis counter
def rate_limit(client_id, limit=100, window=60):
    key = f"rate:{client_id}:{int(time.time() / window)}"
    current = redis.incr(key)
    if current == 1:
        redis.expire(key, window)
    return current <= limit

Local + Sync (Approximate)

Each server maintains local counters, periodically syncing with a central store. Allows some over-limit requests but avoids Redis latency on every request.


Response Headers

Communicate rate limit status to clients:

HTTP/1.1 200 OK
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 67
X-RateLimit-Reset: 1709567400
Retry-After: 30

When rate limited:

HTTP/1.1 429 Too Many Requests
Retry-After: 30
Content-Type: application/json

{
  "error": "rate_limit_exceeded",
  "message": "Rate limit of 100 requests per minute exceeded",
  "retry_after": 30
}

Tiered Rate Limits

Different limits for different client tiers:

rate_limits:
  free:
    requests_per_minute: 60
    requests_per_day: 1000
  pro:
    requests_per_minute: 600
    requests_per_day: 50000
  enterprise:
    requests_per_minute: 6000
    requests_per_day: unlimited

Anti-Patterns

Anti-PatternConsequenceFix
No rate limitingSingle client can DoS your APIImplement rate limiting on day one
Same limit for all endpointsExpensive endpoints unprotectedPer-endpoint limits based on cost
Hard reject without Retry-AfterClients retry immediately, making it worseAlways include Retry-After header
Rate limit by IP onlyShared IPs (NAT) punish legitimate usersRate limit by API key + IP
No monitoring of rate limit hitsCannot distinguish abuse from growthDashboard and alerts on 429 rates

Rate limiting is both a security measure and a product feature. Done well, it protects your infrastructure while giving paying customers the throughput they need.

Jakub Dimitri Rezayev
Jakub Dimitri Rezayev
Founder & Chief Architect • Garnet Grid Consulting

Jakub holds an M.S. in Customer Intelligence & Analytics and a B.S. in Finance & Computer Science from Pace University. With deep expertise spanning D365 F&O, Azure, Power BI, and AI/ML systems, he architects enterprise solutions that bridge legacy systems and modern technology — and has led multi-million dollar ERP implementations for Fortune 500 supply chains.

View Full Profile →