ESC
Type to search guides, tutorials, and reference documentation.
Verified by Garnet Grid

API Gateway Networking: Traffic Management at the Edge

A deep dive into API gateway networking — rate limiting, circuit breaking, request routing, TLS termination, and edge security patterns for production APIs.

API gateways sit at the boundary between external traffic and internal services. They’re the single point of control for authentication, rate limiting, request routing, and protocol translation. Getting gateway networking right is critical — it’s the front door to your entire platform.

Gateway Architecture

Internet → CDN → WAF → API Gateway → Service Mesh → Backend Services

                    ┌─────────────┐
                    │ Rate Limit  │
                    │ Auth/AuthZ  │
                    │ Routing     │
                    │ Transform   │
                    │ Logging     │
                    └─────────────┘

Gateway Deployment Models

Edge Gateway Single gateway at the network edge handling all external traffic.

  • Best for: Simple architectures, small teams
  • Risk: Single point of failure

Tiered Gateways Edge gateway handles external concerns (TLS, WAF), internal gateway handles service routing.

  • Best for: Medium-to-large organizations
  • Benefit: Separation of concerns

Gateway Per Domain Each business domain gets its own gateway (payments gateway, user gateway, etc.).

  • Best for: Large organizations with independent teams
  • Benefit: Independent scaling and deployment

Traffic Management

Rate Limiting

Protect backend services from being overwhelmed:

Fixed Window:

Rule: 100 requests per minute per API key
Counter resets at the top of each minute
Problem: Burst at window boundary (200 requests in 2 seconds straddling the reset)

Sliding Window:

Rule: 100 requests per minute (rolling)
Count = (previous_window_count × overlap_percentage) + current_window_count
Smoother than fixed window, slightly more complex

Token Bucket:

Bucket capacity: 100 tokens
Refill rate: 100 tokens/minute
Each request consumes 1 token
Allows controlled bursts up to bucket capacity

Leaky Bucket:

Queue capacity: 100 requests
Processing rate: constant (e.g., 2 req/sec)
Excess requests are dropped
Smoothest output rate, strictest on bursts

Circuit Breaking

Prevent cascading failures when a backend service is unhealthy:

States: CLOSED → OPEN → HALF-OPEN → CLOSED
         ↑                    ↓
         └────────────────────┘

CLOSED: Normal operation, requests flow through
  → If error_rate > 50% for 10 consecutive requests → OPEN

OPEN: All requests immediately fail (503)
  → After 30 second timeout → HALF-OPEN

HALF-OPEN: Allow 1 request through as a probe
  → If probe succeeds → CLOSED
  → If probe fails → OPEN (reset timeout)

Request Routing

Path-Based:

/api/v1/users/* → user-service
/api/v1/payments/* → payment-service
/api/v1/search/* → search-service

Header-Based:

X-API-Version: 2 → service-v2
X-API-Version: 1 → service-v1

Weight-Based (Canary):

90% traffic → service-v1 (stable)
10% traffic → service-v2 (canary)

Load Balancing Algorithms

AlgorithmDescriptionBest For
Round RobinDistribute equally in orderHomogeneous backends
Weighted Round RobinDistribute based on weightsMixed-capacity backends
Least ConnectionsSend to backend with fewest active connectionsLong-lived connections
IP HashConsistent routing based on client IPSession affinity
RandomRandom selectionSimple, surprisingly effective

Security at the Edge

TLS Termination

The gateway handles TLS handshakes, offloading encryption from backend services:

  • Reduces CPU load on backends
  • Centralizes certificate management
  • Enables inspection of decrypted traffic for security rules

mTLS (Mutual TLS)

Both client and server present certificates:

  • Used for service-to-service authentication
  • Prevents unauthorized services from calling backends
  • Common in zero-trust architectures

Request Validation

Validate requests before they reach backends:

  • Schema validation (OpenAPI spec)
  • Input sanitization (SQL injection, XSS)
  • Payload size limits
  • Content-type enforcement

Observability

Key Metrics

  • Request rate — Requests per second by route, method, and status code
  • Latency — P50, P95, P99 by route (gateway overhead should be < 5ms)
  • Error rate — 4xx and 5xx by route and backend
  • Connection pool — Active connections, queue depth, timeouts
  • Rate limit hits — Which clients are being throttled

Distributed Tracing

The gateway should inject trace headers (W3C Trace Context or B3) into every request:

traceparent: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01

This enables end-to-end tracing from the client through the gateway to every backend service.

Gateway Comparison

GatewayTypeBest For
KongOSS/EnterprisePlugin ecosystem, Lua/Go extensibility
AWS API GatewayManagedAWS-native, serverless backends
NGINXOSS/PlusHigh performance, traditional web serving
EnvoyOSSService mesh sidecar, gRPC-native
TraefikOSS/EnterpriseDocker/Kubernetes-native, auto-discovery
Azure API ManagementManagedAzure-native, developer portal
ApigeeManagedEnterprise API program management

Anti-Patterns

Gateway as Business Logic Layer

The gateway should handle cross-cutting concerns (auth, rate limiting, routing), not business logic. Don’t put data transformation or validation rules in the gateway.

Single Gateway for Everything

Don’t route both public API traffic and internal service-to-service traffic through the same gateway. Their requirements are fundamentally different.

No Gateway Health Checks

The gateway must actively health-check backend services. Routing traffic to unhealthy backends defeats the purpose.

Ignoring Gateway as SPOF

Run multiple gateway instances behind a load balancer. A single gateway instance is a single point of failure for your entire platform.

Over-Customization

Using too many custom plugins or filters makes upgrades painful. Prefer built-in features and standard patterns.

Production Checklist

  • TLS termination with automated certificate rotation
  • Rate limiting per API key/tenant with appropriate limits
  • Circuit breakers configured for every backend route
  • Health checks active for all backend services
  • Distributed tracing headers injected
  • Logging includes request ID correlation
  • Multiple gateway instances with load balancing
  • Graceful shutdown (drain connections before stopping)
  • Metrics dashboards for latency, error rate, and throughput
  • Alert rules for anomalous error rates and latency spikes
Jakub Dimitri Rezayev
Jakub Dimitri Rezayev
Founder & Chief Architect • Garnet Grid Consulting

Jakub holds an M.S. in Customer Intelligence & Analytics and a B.S. in Finance & Computer Science from Pace University. With deep expertise spanning D365 F&O, Azure, Power BI, and AI/ML systems, he architects enterprise solutions that bridge legacy systems and modern technology — and has led multi-million dollar ERP implementations for Fortune 500 supply chains.

View Full Profile →