ESC
Type to search guides, tutorials, and reference documentation.
Verified by Garnet Grid

API Rate Limiting Patterns

Protect backend services from abuse and overload with rate limiting. Covers token bucket, sliding window, distributed rate limiting, client-specific limits, and the patterns that keep your API available under any traffic pattern.

Without rate limiting, a single misconfigured client, a viral blog post, or a malicious actor can overwhelm your API and bring down the service for everyone. Rate limiting is the circuit breaker between your API and the world — it ensures fair access, prevents abuse, and protects backend resources.


Algorithm Comparison

Fixed Window:
  How: Count requests per fixed time window (e.g., per minute)
  Pros: Simple, low memory
  Cons: Burst at window boundary (double the limit)
  
  Window: |──── minute 1 ────|──── minute 2 ────|
  Limit:  100 requests/min
  Problem: 100 requests at 0:59 + 100 at 1:00 = 200 in 2 seconds

Sliding Window Log:
  How: Track timestamp of each request, count within window
  Pros: Accurate, no boundary burst
  Cons: High memory (store every timestamp)
  
Sliding Window Counter:
  How: Weighted average of previous and current window
  Pros: Accurate, low memory
  Cons: Slightly approximate
  
  Requests in window = (prev_count × overlap%) + current_count
  If prev_window had 80 requests, 30% overlap, current has 20:
  Effective count = (80 × 0.3) + 20 = 44

Token Bucket:
  How: Bucket fills with tokens at constant rate, each request takes a token
  Pros: Allows bursts up to bucket size, smooth average rate
  Cons: Slightly more complex
  
  Bucket size: 10 (allows burst of 10)
  Refill rate: 1 token/second (10 req/10 sec average)
  
  t=0: 10 tokens → send 5 requests → 5 tokens remaining
  t=5: 10 tokens → bucket refilled

Implementation

import time
import redis

class SlidingWindowRateLimiter:
    """Distributed rate limiter using Redis sorted sets."""
    
    def __init__(self, redis_client: redis.Redis):
        self.redis = redis_client
    
    def is_allowed(self, key: str, limit: int, window_seconds: int) -> dict:
        """Check if a request is allowed under the rate limit."""
        now = time.time()
        window_start = now - window_seconds
        
        pipe = self.redis.pipeline()
        
        # Remove expired entries
        pipe.zremrangebyscore(key, 0, window_start)
        
        # Count requests in current window
        pipe.zcard(key)
        
        # Add current request (optimistically)
        pipe.zadd(key, {f"{now}:{id(key)}": now})
        
        # Set TTL for cleanup
        pipe.expire(key, window_seconds)
        
        results = pipe.execute()
        request_count = results[1]
        
        allowed = request_count < limit
        
        if not allowed:
            # Remove the optimistic add
            self.redis.zrem(key, f"{now}:{id(key)}")
        
        return {
            "allowed": allowed,
            "remaining": max(0, limit - request_count - 1),
            "reset": int(now + window_seconds),
            "limit": limit,
        }

# Usage:
limiter = SlidingWindowRateLimiter(redis.Redis())

# Rate limit per API key: 100 requests per minute
result = limiter.is_allowed(
    key=f"ratelimit:{api_key}",
    limit=100,
    window_seconds=60,
)

if not result["allowed"]:
    # Return 429 Too Many Requests
    headers = {
        "X-RateLimit-Limit": str(result["limit"]),
        "X-RateLimit-Remaining": str(result["remaining"]),
        "X-RateLimit-Reset": str(result["reset"]),
        "Retry-After": str(result["reset"] - int(time.time())),
    }

Response Headers

Standard rate limit headers (RFC 6585 + draft-ietf-httpapi-ratelimit-headers):

HTTP/1.1 429 Too Many Requests
Content-Type: application/json
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1710432000
Retry-After: 42

{
  "error": "rate_limit_exceeded",
  "message": "Rate limit of 100 requests per minute exceeded",
  "retry_after": 42
}

Always include:
  X-RateLimit-Limit:     Total allowed requests per window
  X-RateLimit-Remaining: Requests remaining in current window
  X-RateLimit-Reset:     Unix timestamp when window resets
  Retry-After:           Seconds until client can retry

Anti-Patterns

Anti-PatternConsequenceFix
No rate limiting at allSingle client can DoS your APIRate limit all endpoints
Same limit for all clientsEnterprise customers hit limits, abusers get same quotaTiered limits by plan/API key
Rate limit only at API gatewayInternal services can still overload each otherRate limit at service level too
No response headersClients cannot self-regulateAlways return rate limit headers
Hard reject without retry infoClients hammer the API with retriesReturn Retry-After header with backoff

Rate limiting is not about saying no — it is about ensuring fair access. Good rate limiting protects your service while giving clients clear feedback on their usage and when they can retry.

Jakub Dimitri Rezayev
Jakub Dimitri Rezayev
Founder & Chief Architect • Garnet Grid Consulting

Jakub holds an M.S. in Customer Intelligence & Analytics and a B.S. in Finance & Computer Science from Pace University. With deep expertise spanning D365 F&O, Azure, Power BI, and AI/ML systems, he architects enterprise solutions that bridge legacy systems and modern technology — and has led multi-million dollar ERP implementations for Fortune 500 supply chains.

View Full Profile →