API Gateway Networking: Traffic Management at the Edge

API gateways sit at the boundary between external traffic and internal services. They’re the single point of control for authentication, rate limiting, request routing, and protocol translation. Getting gateway networking right is critical — it’s the front door to your entire platform.

Gateway Architecture

Internet → CDN → WAF → API Gateway → Service Mesh → Backend Services
                          ↓
                    ┌─────────────┐
                    │ Rate Limit  │
                    │ Auth/AuthZ  │
                    │ Routing     │
                    │ Transform   │
                    │ Logging     │
                    └─────────────┘

Gateway Deployment Models

Edge Gateway Single gateway at the network edge handling all external traffic.

Best for: Simple architectures, small teams
Risk: Single point of failure

Tiered Gateways Edge gateway handles external concerns (TLS, WAF), internal gateway handles service routing.

Best for: Medium-to-large organizations
Benefit: Separation of concerns

Gateway Per Domain Each business domain gets its own gateway (payments gateway, user gateway, etc.).

Best for: Large organizations with independent teams
Benefit: Independent scaling and deployment

Traffic Management

Rate Limiting

Protect backend services from being overwhelmed:

Fixed Window:

Rule: 100 requests per minute per API key
Counter resets at the top of each minute
Problem: Burst at window boundary (200 requests in 2 seconds straddling the reset)

Sliding Window:

Rule: 100 requests per minute (rolling)
Count = (previous_window_count × overlap_percentage) + current_window_count
Smoother than fixed window, slightly more complex

Token Bucket:

Bucket capacity: 100 tokens
Refill rate: 100 tokens/minute
Each request consumes 1 token
Allows controlled bursts up to bucket capacity

Leaky Bucket:

Queue capacity: 100 requests
Processing rate: constant (e.g., 2 req/sec)
Excess requests are dropped
Smoothest output rate, strictest on bursts

Circuit Breaking

Prevent cascading failures when a backend service is unhealthy:

States: CLOSED → OPEN → HALF-OPEN → CLOSED
         ↑                    ↓
         └────────────────────┘

CLOSED: Normal operation, requests flow through
  → If error_rate > 50% for 10 consecutive requests → OPEN

OPEN: All requests immediately fail (503)
  → After 30 second timeout → HALF-OPEN

HALF-OPEN: Allow 1 request through as a probe
  → If probe succeeds → CLOSED
  → If probe fails → OPEN (reset timeout)

Request Routing

Path-Based:

/api/v1/users/* → user-service
/api/v1/payments/* → payment-service
/api/v1/search/* → search-service

Header-Based:

X-API-Version: 2 → service-v2
X-API-Version: 1 → service-v1

Weight-Based (Canary):

90% traffic → service-v1 (stable)
10% traffic → service-v2 (canary)

Load Balancing Algorithms

Algorithm	Description	Best For
Round Robin	Distribute equally in order	Homogeneous backends
Weighted Round Robin	Distribute based on weights	Mixed-capacity backends
Least Connections	Send to backend with fewest active connections	Long-lived connections
IP Hash	Consistent routing based on client IP	Session affinity
Random	Random selection	Simple, surprisingly effective

Security at the Edge

TLS Termination

The gateway handles TLS handshakes, offloading encryption from backend services:

Reduces CPU load on backends
Centralizes certificate management
Enables inspection of decrypted traffic for security rules

mTLS (Mutual TLS)

Both client and server present certificates:

Used for service-to-service authentication
Prevents unauthorized services from calling backends
Common in zero-trust architectures

Request Validation

Validate requests before they reach backends:

Schema validation (OpenAPI spec)
Input sanitization (SQL injection, XSS)
Payload size limits
Content-type enforcement

Observability

Key Metrics

Request rate — Requests per second by route, method, and status code
Latency — P50, P95, P99 by route (gateway overhead should be < 5ms)
Error rate — 4xx and 5xx by route and backend
Connection pool — Active connections, queue depth, timeouts
Rate limit hits — Which clients are being throttled

Distributed Tracing

The gateway should inject trace headers (W3C Trace Context or B3) into every request:

traceparent: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01

This enables end-to-end tracing from the client through the gateway to every backend service.

Gateway Comparison

Gateway	Type	Best For
Kong	OSS/Enterprise	Plugin ecosystem, Lua/Go extensibility
AWS API Gateway	Managed	AWS-native, serverless backends
NGINX	OSS/Plus	High performance, traditional web serving
Envoy	OSS	Service mesh sidecar, gRPC-native
Traefik	OSS/Enterprise	Docker/Kubernetes-native, auto-discovery
Azure API Management	Managed	Azure-native, developer portal
Apigee	Managed	Enterprise API program management

Anti-Patterns

Gateway as Business Logic Layer

The gateway should handle cross-cutting concerns (auth, rate limiting, routing), not business logic. Don’t put data transformation or validation rules in the gateway.

Single Gateway for Everything

Don’t route both public API traffic and internal service-to-service traffic through the same gateway. Their requirements are fundamentally different.

No Gateway Health Checks

The gateway must actively health-check backend services. Routing traffic to unhealthy backends defeats the purpose.

Ignoring Gateway as SPOF

Run multiple gateway instances behind a load balancer. A single gateway instance is a single point of failure for your entire platform.

Over-Customization

Using too many custom plugins or filters makes upgrades painful. Prefer built-in features and standard patterns.