API Gateway Networking: Traffic Management at the Edge
A deep dive into API gateway networking — rate limiting, circuit breaking, request routing, TLS termination, and edge security patterns for production APIs.
API gateways sit at the boundary between external traffic and internal services. They’re the single point of control for authentication, rate limiting, request routing, and protocol translation. Getting gateway networking right is critical — it’s the front door to your entire platform.
Gateway Architecture
Internet → CDN → WAF → API Gateway → Service Mesh → Backend Services
↓
┌─────────────┐
│ Rate Limit │
│ Auth/AuthZ │
│ Routing │
│ Transform │
│ Logging │
└─────────────┘
Gateway Deployment Models
Edge Gateway Single gateway at the network edge handling all external traffic.
- Best for: Simple architectures, small teams
- Risk: Single point of failure
Tiered Gateways Edge gateway handles external concerns (TLS, WAF), internal gateway handles service routing.
- Best for: Medium-to-large organizations
- Benefit: Separation of concerns
Gateway Per Domain Each business domain gets its own gateway (payments gateway, user gateway, etc.).
- Best for: Large organizations with independent teams
- Benefit: Independent scaling and deployment
Traffic Management
Rate Limiting
Protect backend services from being overwhelmed:
Fixed Window:
Rule: 100 requests per minute per API key
Counter resets at the top of each minute
Problem: Burst at window boundary (200 requests in 2 seconds straddling the reset)
Sliding Window:
Rule: 100 requests per minute (rolling)
Count = (previous_window_count × overlap_percentage) + current_window_count
Smoother than fixed window, slightly more complex
Token Bucket:
Bucket capacity: 100 tokens
Refill rate: 100 tokens/minute
Each request consumes 1 token
Allows controlled bursts up to bucket capacity
Leaky Bucket:
Queue capacity: 100 requests
Processing rate: constant (e.g., 2 req/sec)
Excess requests are dropped
Smoothest output rate, strictest on bursts
Circuit Breaking
Prevent cascading failures when a backend service is unhealthy:
States: CLOSED → OPEN → HALF-OPEN → CLOSED
↑ ↓
└────────────────────┘
CLOSED: Normal operation, requests flow through
→ If error_rate > 50% for 10 consecutive requests → OPEN
OPEN: All requests immediately fail (503)
→ After 30 second timeout → HALF-OPEN
HALF-OPEN: Allow 1 request through as a probe
→ If probe succeeds → CLOSED
→ If probe fails → OPEN (reset timeout)
Request Routing
Path-Based:
/api/v1/users/* → user-service
/api/v1/payments/* → payment-service
/api/v1/search/* → search-service
Header-Based:
X-API-Version: 2 → service-v2
X-API-Version: 1 → service-v1
Weight-Based (Canary):
90% traffic → service-v1 (stable)
10% traffic → service-v2 (canary)
Load Balancing Algorithms
| Algorithm | Description | Best For |
|---|---|---|
| Round Robin | Distribute equally in order | Homogeneous backends |
| Weighted Round Robin | Distribute based on weights | Mixed-capacity backends |
| Least Connections | Send to backend with fewest active connections | Long-lived connections |
| IP Hash | Consistent routing based on client IP | Session affinity |
| Random | Random selection | Simple, surprisingly effective |
Security at the Edge
TLS Termination
The gateway handles TLS handshakes, offloading encryption from backend services:
- Reduces CPU load on backends
- Centralizes certificate management
- Enables inspection of decrypted traffic for security rules
mTLS (Mutual TLS)
Both client and server present certificates:
- Used for service-to-service authentication
- Prevents unauthorized services from calling backends
- Common in zero-trust architectures
Request Validation
Validate requests before they reach backends:
- Schema validation (OpenAPI spec)
- Input sanitization (SQL injection, XSS)
- Payload size limits
- Content-type enforcement
Observability
Key Metrics
- Request rate — Requests per second by route, method, and status code
- Latency — P50, P95, P99 by route (gateway overhead should be < 5ms)
- Error rate — 4xx and 5xx by route and backend
- Connection pool — Active connections, queue depth, timeouts
- Rate limit hits — Which clients are being throttled
Distributed Tracing
The gateway should inject trace headers (W3C Trace Context or B3) into every request:
traceparent: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01
This enables end-to-end tracing from the client through the gateway to every backend service.
Gateway Comparison
| Gateway | Type | Best For |
|---|---|---|
| Kong | OSS/Enterprise | Plugin ecosystem, Lua/Go extensibility |
| AWS API Gateway | Managed | AWS-native, serverless backends |
| NGINX | OSS/Plus | High performance, traditional web serving |
| Envoy | OSS | Service mesh sidecar, gRPC-native |
| Traefik | OSS/Enterprise | Docker/Kubernetes-native, auto-discovery |
| Azure API Management | Managed | Azure-native, developer portal |
| Apigee | Managed | Enterprise API program management |
Anti-Patterns
Gateway as Business Logic Layer
The gateway should handle cross-cutting concerns (auth, rate limiting, routing), not business logic. Don’t put data transformation or validation rules in the gateway.
Single Gateway for Everything
Don’t route both public API traffic and internal service-to-service traffic through the same gateway. Their requirements are fundamentally different.
No Gateway Health Checks
The gateway must actively health-check backend services. Routing traffic to unhealthy backends defeats the purpose.
Ignoring Gateway as SPOF
Run multiple gateway instances behind a load balancer. A single gateway instance is a single point of failure for your entire platform.
Over-Customization
Using too many custom plugins or filters makes upgrades painful. Prefer built-in features and standard patterns.
Production Checklist
- TLS termination with automated certificate rotation
- Rate limiting per API key/tenant with appropriate limits
- Circuit breakers configured for every backend route
- Health checks active for all backend services
- Distributed tracing headers injected
- Logging includes request ID correlation
- Multiple gateway instances with load balancing
- Graceful shutdown (drain connections before stopping)
- Metrics dashboards for latency, error rate, and throughput
- Alert rules for anomalous error rates and latency spikes