Load Balancing Strategies
Choose and configure load balancing strategies for different application requirements. Covers L4 vs L7 load balancing, health checks, session persistence, global load balancing, and the algorithms that determine where traffic goes.
Load balancing distributes incoming traffic across multiple servers to ensure no single server becomes overwhelmed. The choice of load balancing strategy affects latency, availability, and cost.
Layer 4 vs Layer 7
Layer 4 (Transport)
Operates on TCP/UDP connections without inspecting content:
Client → L4 Load Balancer → Backend Server
(sees: IP, port, protocol)
(cannot see: HTTP headers, URL path, cookies)
Strengths: Fast, efficient, protocol-agnostic. Use when: TCP pass-through, database connections, non-HTTP protocols.
Layer 7 (Application)
Inspects HTTP content for intelligent routing:
Client → L7 Load Balancer → Backend Server
(sees: URL, headers, cookies, method, body)
(can: route by path, rewrite headers, terminate TLS)
Strengths: Content-based routing, SSL termination, caching, compression. Use when: HTTP APIs, web applications, microservices routing.
Algorithms
| Algorithm | How It Works | Best For |
|---|---|---|
| Round Robin | Requests distributed sequentially | Equal-capacity servers |
| Weighted Round Robin | More requests to higher-weight servers | Mixed-capacity servers |
| Least Connections | Route to server with fewest active connections | Variable request duration |
| IP Hash | Same client IP always goes to same server | Simple session affinity |
| Random | Random server selection | Large server pools |
| Least Response Time | Route to fastest-responding server | Performance optimization |
Least Connections Example
Server A: 45 active connections
Server B: 12 active connections ← Next request goes here
Server C: 33 active connections
Health Checks
health_check:
# Active health checks
http:
path: /health
interval: 10s
timeout: 5s
healthy_threshold: 2 # Pass 2 checks → mark healthy
unhealthy_threshold: 3 # Fail 3 checks → mark unhealthy
expected_status: [200]
# Passive health checks (real traffic monitoring)
passive:
consecutive_errors: 5 # 5 consecutive errors → mark unhealthy
error_timeout: 30s # Reset error counter after 30s
Health Check Design
@app.get('/health')
def health_check():
checks = {
'database': check_database(),
'cache': check_redis(),
'disk': check_disk_space(),
}
healthy = all(checks.values())
status_code = 200 if healthy else 503
return JSONResponse(
status_code=status_code,
content={'status': 'healthy' if healthy else 'unhealthy', 'checks': checks}
)
Session Persistence
Cookie-Based
1. Client → LB → Server A (sets cookie: server=A)
2. Client → LB → Server A (cookie present, route to A)
Header-Based
# Route based on custom header
upstream backend {
hash $http_x_session_id consistent;
server backend1:8080;
server backend2:8080;
server backend3:8080;
}
When to Avoid
Session affinity prevents even load distribution and complicates scaling. Prefer stateless backends with external session stores:
Client → LB → Any Server → Redis (shared session store)
Global Load Balancing
DNS-based GSLB:
US user → us-east.api.example.com → US data center
EU user → eu-west.api.example.com → European data center
Asia user → ap-east.api.example.com → Asian data center
Failover:
US data center down → DNS routes US users to EU data center
Cloud Implementations
| Provider | Service | Type |
|---|---|---|
| AWS | ALB / NLB / Global Accelerator | L7 / L4 / Global |
| GCP | Cloud Load Balancing | L4 / L7 / Global |
| Azure | Application Gateway / Traffic Manager | L7 / Global |
| Cloudflare | Load Balancing | L7 / Global |
Anti-Patterns
| Anti-Pattern | Consequence | Fix |
|---|---|---|
| No health checks | Traffic sent to dead servers | Active + passive health checks |
| TCP health check for HTTP service | Server is up but app is broken | HTTP health check at application level |
| Session affinity as default | Uneven load, scaling issues | Stateless backends, external session store |
| Single load balancer | Single point of failure | Active-passive or active-active LB pair |
| No connection draining | In-flight requests dropped during deploy | Graceful shutdown with draining period |
Load balancing is the front door of your application. Its configuration directly determines user-perceived latency, availability during failures, and the effectiveness of your scaling strategy.