ESC
Type to search guides, tutorials, and reference documentation.
Verified by Garnet Grid

Load Balancing Strategies: Beyond Round Robin

Choose and configure load balancing algorithms that match your workload characteristics. Covers L4 vs L7 balancing, health checks, connection draining, session affinity, and the failure modes that take down your entire service.

Load balancing looks simple from the outside: distribute traffic across multiple servers. In practice, the choice of algorithm, health check configuration, and failover behavior determine whether your service degrades gracefully under pressure or falls off a cliff.

Most teams use the default round-robin algorithm, never look at it again, and then wonder why one server is at 95% CPU while three others are at 20%. This guide covers how to choose the right balancing strategy and configure it to handle the failure modes that actually happen in production.


L4 vs L7: Where to Balance

LayerWhat It SeesUse WhenExamples
L4 (Transport)IP addresses, TCP/UDP portsHigh throughput, simple routing, non-HTTP protocolsNLB, HAProxy (TCP mode)
L7 (Application)HTTP headers, URL paths, cookiesContent-based routing, SSL termination, API gatewayALB, HAProxy (HTTP mode), Envoy, Nginx
L4 load balancer:
  Sees: TCP connection from 10.0.1.50:52441 → 10.0.2.10:443
  Decides: Route to backend 3 (based on IP hash or round robin)
  Cannot see: HTTP method, URL path, headers, cookies

L7 load balancer:
  Sees: GET /api/v2/orders HTTP/1.1
        Host: api.example.com
        Cookie: session=abc123
  Decides: Route /api/v2/* to API cluster, /static/* to CDN,
           session=abc123 → backend 2 (sticky session)

General rule: Use L7 for web applications and APIs (you almost always need content-based routing). Use L4 for databases, message queues, and high-throughput non-HTTP services.


Load Balancing Algorithms

AlgorithmHow It WorksBest ForWeakness
Round RobinEach request goes to the next server in sequenceHomogeneous backends, stateless requestsIgnores server load and request cost
Weighted Round RobinRound robin with weight per serverMixed capacity servers (old + new hardware)Weights must be manually set
Least ConnectionsRoute to the server with fewest active connectionsRequests with variable processing timeCan overload recovering servers
Least Response TimeRoute to the fastest-responding serverLatency-sensitive applicationsRequires response time measurement
IP HashHash client IP to determine serverRough session affinity without cookiesUneven distribution with NAT
RandomPick a random serverSimple, surprisingly effectiveNo load awareness
Power of Two ChoicesPick 2 random servers, use the less loaded oneLarge clusters, combines simplicity with load awarenessSlightly more complex

When Round Robin Fails

Scenario: 4 backend servers, one request type takes 10x longer

Round Robin distribution:
  Server 1: light, light, HEAVY, light   → 10% CPU, then spikes to 90%
  Server 2: light, light, light, light    → 15% CPU
  Server 3: light, HEAVY, light, HEAVY   → 85% CPU (overloaded!)
  Server 4: light, light, light, light    → 15% CPU

Least Connections distribution:
  Server 1: light, light, light, light    → 20% CPU
  Server 2: light, light, HEAVY           → 45% CPU (gets fewer new connections
  Server 3: light, light, light            while processing heavy request)
  Server 4: light, HEAVY, light           → 45% CPU

  Better balance because heavy requests "protect" the server from getting
  more traffic while they are processing.

Health Checks: The First Line of Defense

A bad health check configuration is worse than no health checks at all. Too aggressive and you flap servers in and out. Too lenient and you route traffic to dead servers for minutes.

# Good health check configuration
health_check:
  protocol: HTTP
  path: /health
  port: 8080
  interval: 10s          # Check every 10 seconds
  timeout: 5s            # Give up after 5 seconds
  healthy_threshold: 2   # Mark healthy after 2 consecutive successes
  unhealthy_threshold: 3 # Mark unhealthy after 3 consecutive failures

# What /health should check:
#   ✅ Application is started and accepting connections
#   ✅ Critical dependencies are reachable (database, cache)
#   ❌ NOT full business logic validation (too slow, too flaky)
#   ❌ NOT external third-party APIs (their outage ≠ your server is unhealthy)
Health Check MistakeWhat HappensFix
No health checkTraffic routes to crashed serversAlways configure health checks
Checks external dependencyOne API goes down, ALL your servers marked unhealthyOnly check critical internal dependencies
1-failure thresholdHealthy server marked down from one timeoutUse threshold ≥ 3
Slow health endpointHealth check times out because it runs a full DB queryHealth endpoint should respond in < 100ms

Deep vs Shallow Health Checks

# Shallow health check (fast, always works)
@app.get("/health")
def health():
    return {"status": "ok"}

# Deep health check (verifies dependencies)
@app.get("/health/ready")
async def readiness():
    checks = {}
    
    # Database
    try:
        await db.execute("SELECT 1")
        checks["database"] = "ok"
    except Exception:
        checks["database"] = "failed"
    
    # Redis cache
    try:
        await redis.ping()
        checks["cache"] = "ok"
    except Exception:
        checks["cache"] = "failed"
    
    all_ok = all(v == "ok" for v in checks.values())
    status_code = 200 if all_ok else 503
    return JSONResponse({"checks": checks}, status_code=status_code)

Use shallow checks for liveness probes (is the process running?). Use deep checks for readiness probes (is the service ready to handle traffic?).


Connection Draining

When a server is taken out of rotation (for deployment, scaling, or failure), existing connections should be allowed to finish rather than being killed mid-request.

# Connection draining configuration
deregistration_delay: 30s    # Allow 30 seconds for in-flight requests
# During this window:
#   - No NEW connections routed to this server
#   - Existing connections continue until complete or timeout
#   - After 30s: remaining connections are forcefully closed
Drain DurationSuitable For
5-10 secondsFast APIs (< 1s response time)
30 secondsStandard web applications
60-120 secondsLong-running requests (file uploads, reports)
300 secondsWebSocket connections

Session Affinity (Sticky Sessions)

MethodHow It WorksTradeoff
Cookie-basedLoad balancer sets a cookie mapping to a backendMost reliable, requires L7
IP-basedHash source IP to determine backendBreaks with NAT/proxies
No affinityAny server handles any requestRequires externalized session state (Redis)

Strong recommendation: Avoid sticky sessions. Externalize session state to Redis or a database. Sticky sessions create uneven load distribution and make deployments harder (you cannot drain a server if 30% of sessions are pinned to it).


Implementation Checklist

  • Choose L4 or L7 based on your routing needs (L7 for web/API, L4 for non-HTTP)
  • Select algorithm based on workload: least connections for variable-cost requests
  • Configure health checks with ≥ 3 failure threshold and < 100ms response time
  • Separate liveness (shallow) and readiness (deep) health checks
  • Set connection draining to at least 30 seconds on all services
  • Avoid sticky sessions — externalize session state to Redis
  • Monitor backend health: track which servers are going in and out of rotation
  • Test failover: pull a server out of rotation and verify zero dropped requests
  • Set up alerts for: high error rates, uneven load distribution, health check failures
  • Document your load balancing setup: algorithm, health check config, drain settings
Jakub Dimitri Rezayev
Jakub Dimitri Rezayev
Founder & Chief Architect • Garnet Grid Consulting

Jakub holds an M.S. in Customer Intelligence & Analytics and a B.S. in Finance & Computer Science from Pace University. With deep expertise spanning D365 F&O, Azure, Power BI, and AI/ML systems, he architects enterprise solutions that bridge legacy systems and modern technology — and has led multi-million dollar ERP implementations for Fortune 500 supply chains.

View Full Profile →