API Rate Limiting Patterns | The Garnet Wiki

Without rate limiting, a single misconfigured client, a viral blog post, or a malicious actor can overwhelm your API and bring down the service for everyone. Rate limiting is the circuit breaker between your API and the world — it ensures fair access, prevents abuse, and protects backend resources.

Algorithm Comparison

Fixed Window:
  How: Count requests per fixed time window (e.g., per minute)
  Pros: Simple, low memory
  Cons: Burst at window boundary (double the limit)
  
  Window: |──── minute 1 ────|──── minute 2 ────|
  Limit:  100 requests/min
  Problem: 100 requests at 0:59 + 100 at 1:00 = 200 in 2 seconds

Sliding Window Log:
  How: Track timestamp of each request, count within window
  Pros: Accurate, no boundary burst
  Cons: High memory (store every timestamp)
  
Sliding Window Counter:
  How: Weighted average of previous and current window
  Pros: Accurate, low memory
  Cons: Slightly approximate
  
  Requests in window = (prev_count × overlap%) + current_count
  If prev_window had 80 requests, 30% overlap, current has 20:
  Effective count = (80 × 0.3) + 20 = 44

Token Bucket:
  How: Bucket fills with tokens at constant rate, each request takes a token
  Pros: Allows bursts up to bucket size, smooth average rate
  Cons: Slightly more complex
  
  Bucket size: 10 (allows burst of 10)
  Refill rate: 1 token/second (10 req/10 sec average)
  
  t=0: 10 tokens → send 5 requests → 5 tokens remaining
  t=5: 10 tokens → bucket refilled

Implementation

import time
import redis

class SlidingWindowRateLimiter:
    """Distributed rate limiter using Redis sorted sets."""
    
    def __init__(self, redis_client: redis.Redis):
        self.redis = redis_client
    
    def is_allowed(self, key: str, limit: int, window_seconds: int) -> dict:
        """Check if a request is allowed under the rate limit."""
        now = time.time()
        window_start = now - window_seconds
        
        pipe = self.redis.pipeline()
        
        # Remove expired entries
        pipe.zremrangebyscore(key, 0, window_start)
        
        # Count requests in current window
        pipe.zcard(key)
        
        # Add current request (optimistically)
        pipe.zadd(key, {f"{now}:{id(key)}": now})
        
        # Set TTL for cleanup
        pipe.expire(key, window_seconds)
        
        results = pipe.execute()
        request_count = results[1]
        
        allowed = request_count < limit
        
        if not allowed:
            # Remove the optimistic add
            self.redis.zrem(key, f"{now}:{id(key)}")
        
        return {
            "allowed": allowed,
            "remaining": max(0, limit - request_count - 1),
            "reset": int(now + window_seconds),
            "limit": limit,
        }

# Usage:
limiter = SlidingWindowRateLimiter(redis.Redis())

# Rate limit per API key: 100 requests per minute
result = limiter.is_allowed(
    key=f"ratelimit:{api_key}",
    limit=100,
    window_seconds=60,
)

if not result["allowed"]:
    # Return 429 Too Many Requests
    headers = {
        "X-RateLimit-Limit": str(result["limit"]),
        "X-RateLimit-Remaining": str(result["remaining"]),
        "X-RateLimit-Reset": str(result["reset"]),
        "Retry-After": str(result["reset"] - int(time.time())),
    }

Response Headers

Standard rate limit headers (RFC 6585 + draft-ietf-httpapi-ratelimit-headers):

HTTP/1.1 429 Too Many Requests
Content-Type: application/json
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1710432000
Retry-After: 42

{
  "error": "rate_limit_exceeded",
  "message": "Rate limit of 100 requests per minute exceeded",
  "retry_after": 42
}

Always include:
  X-RateLimit-Limit:     Total allowed requests per window
  X-RateLimit-Remaining: Requests remaining in current window
  X-RateLimit-Reset:     Unix timestamp when window resets
  Retry-After:           Seconds until client can retry

Anti-Patterns

Anti-Pattern	Consequence	Fix
No rate limiting at all	Single client can DoS your API	Rate limit all endpoints
Same limit for all clients	Enterprise customers hit limits, abusers get same quota	Tiered limits by plan/API key
Rate limit only at API gateway	Internal services can still overload each other	Rate limit at service level too
No response headers	Clients cannot self-regulate	Always return rate limit headers
Hard reject without retry info	Clients hammer the API with retries	Return Retry-After header with backoff

Rate limiting is not about saying no — it is about ensuring fair access. Good rate limiting protects your service while giving clients clear feedback on their usage and when they can retry.

Algorithm Comparison

Implementation

Response Headers

Anti-Patterns

More in Backend Engineering

API Gateway Patterns: The Front Door to Your Microservices

API Versioning Strategies and Patterns

API Versioning Strategies