Load Balancing Strategies | The Garnet Wiki

Load balancing distributes incoming traffic across multiple servers to ensure no single server becomes overwhelmed. The choice of load balancing strategy affects latency, availability, and cost.

Layer 4 vs Layer 7

Layer 4 (Transport)

Operates on TCP/UDP connections without inspecting content:

Client → L4 Load Balancer → Backend Server
            (sees: IP, port, protocol)
            (cannot see: HTTP headers, URL path, cookies)

Strengths: Fast, efficient, protocol-agnostic. Use when: TCP pass-through, database connections, non-HTTP protocols.

Layer 7 (Application)

Inspects HTTP content for intelligent routing:

Client → L7 Load Balancer → Backend Server
            (sees: URL, headers, cookies, method, body)
            (can: route by path, rewrite headers, terminate TLS)

Strengths: Content-based routing, SSL termination, caching, compression. Use when: HTTP APIs, web applications, microservices routing.

Algorithms

Algorithm	How It Works	Best For
Round Robin	Requests distributed sequentially	Equal-capacity servers
Weighted Round Robin	More requests to higher-weight servers	Mixed-capacity servers
Least Connections	Route to server with fewest active connections	Variable request duration
IP Hash	Same client IP always goes to same server	Simple session affinity
Random	Random server selection	Large server pools
Least Response Time	Route to fastest-responding server	Performance optimization

Least Connections Example

Server A: 45 active connections
Server B: 12 active connections    ← Next request goes here
Server C: 33 active connections

Health Checks

health_check:
  # Active health checks
  http:
    path: /health
    interval: 10s
    timeout: 5s
    healthy_threshold: 2      # Pass 2 checks → mark healthy
    unhealthy_threshold: 3    # Fail 3 checks → mark unhealthy
    expected_status: [200]
    
  # Passive health checks (real traffic monitoring)
  passive:
    consecutive_errors: 5     # 5 consecutive errors → mark unhealthy
    error_timeout: 30s        # Reset error counter after 30s

Health Check Design

@app.get('/health')
def health_check():
    checks = {
        'database': check_database(),
        'cache': check_redis(),
        'disk': check_disk_space(),
    }
    
    healthy = all(checks.values())
    status_code = 200 if healthy else 503
    
    return JSONResponse(
        status_code=status_code,
        content={'status': 'healthy' if healthy else 'unhealthy', 'checks': checks}
    )

Session Persistence

1. Client → LB → Server A (sets cookie: server=A)
2. Client → LB → Server A (cookie present, route to A)

Header-Based

# Route based on custom header
upstream backend {
    hash $http_x_session_id consistent;
    server backend1:8080;
    server backend2:8080;
    server backend3:8080;
}

When to Avoid

Session affinity prevents even load distribution and complicates scaling. Prefer stateless backends with external session stores:

Client → LB → Any Server → Redis (shared session store)

Global Load Balancing

DNS-based GSLB:
  US user  → us-east.api.example.com  → US data center
  EU user  → eu-west.api.example.com  → European data center
  Asia user → ap-east.api.example.com → Asian data center

Failover:
  US data center down → DNS routes US users to EU data center

Cloud Implementations

Provider	Service	Type
AWS	ALB / NLB / Global Accelerator	L7 / L4 / Global
GCP	Cloud Load Balancing	L4 / L7 / Global
Azure	Application Gateway / Traffic Manager	L7 / Global
Cloudflare	Load Balancing	L7 / Global

Anti-Patterns

Anti-Pattern	Consequence	Fix
No health checks	Traffic sent to dead servers	Active + passive health checks
TCP health check for HTTP service	Server is up but app is broken	HTTP health check at application level
Session affinity as default	Uneven load, scaling issues	Stateless backends, external session store
Single load balancer	Single point of failure	Active-passive or active-active LB pair
No connection draining	In-flight requests dropped during deploy	Graceful shutdown with draining period

Load balancing is the front door of your application. Its configuration directly determines user-perceived latency, availability during failures, and the effectiveness of your scaling strategy.