ESC
Type to search guides, tutorials, and reference documentation.
Verified by Garnet Grid

Load Balancing Strategies

Choose and configure load balancing strategies for different application requirements. Covers L4 vs L7 load balancing, health checks, session persistence, global load balancing, and the algorithms that determine where traffic goes.

Load balancing distributes incoming traffic across multiple servers to ensure no single server becomes overwhelmed. The choice of load balancing strategy affects latency, availability, and cost.


Layer 4 vs Layer 7

Layer 4 (Transport)

Operates on TCP/UDP connections without inspecting content:

Client → L4 Load Balancer → Backend Server
            (sees: IP, port, protocol)
            (cannot see: HTTP headers, URL path, cookies)

Strengths: Fast, efficient, protocol-agnostic. Use when: TCP pass-through, database connections, non-HTTP protocols.

Layer 7 (Application)

Inspects HTTP content for intelligent routing:

Client → L7 Load Balancer → Backend Server
            (sees: URL, headers, cookies, method, body)
            (can: route by path, rewrite headers, terminate TLS)

Strengths: Content-based routing, SSL termination, caching, compression. Use when: HTTP APIs, web applications, microservices routing.


Algorithms

AlgorithmHow It WorksBest For
Round RobinRequests distributed sequentiallyEqual-capacity servers
Weighted Round RobinMore requests to higher-weight serversMixed-capacity servers
Least ConnectionsRoute to server with fewest active connectionsVariable request duration
IP HashSame client IP always goes to same serverSimple session affinity
RandomRandom server selectionLarge server pools
Least Response TimeRoute to fastest-responding serverPerformance optimization

Least Connections Example

Server A: 45 active connections
Server B: 12 active connections    ← Next request goes here
Server C: 33 active connections

Health Checks

health_check:
  # Active health checks
  http:
    path: /health
    interval: 10s
    timeout: 5s
    healthy_threshold: 2      # Pass 2 checks → mark healthy
    unhealthy_threshold: 3    # Fail 3 checks → mark unhealthy
    expected_status: [200]
    
  # Passive health checks (real traffic monitoring)
  passive:
    consecutive_errors: 5     # 5 consecutive errors → mark unhealthy
    error_timeout: 30s        # Reset error counter after 30s

Health Check Design

@app.get('/health')
def health_check():
    checks = {
        'database': check_database(),
        'cache': check_redis(),
        'disk': check_disk_space(),
    }
    
    healthy = all(checks.values())
    status_code = 200 if healthy else 503
    
    return JSONResponse(
        status_code=status_code,
        content={'status': 'healthy' if healthy else 'unhealthy', 'checks': checks}
    )

Session Persistence

1. Client → LB → Server A (sets cookie: server=A)
2. Client → LB → Server A (cookie present, route to A)

Header-Based

# Route based on custom header
upstream backend {
    hash $http_x_session_id consistent;
    server backend1:8080;
    server backend2:8080;
    server backend3:8080;
}

When to Avoid

Session affinity prevents even load distribution and complicates scaling. Prefer stateless backends with external session stores:

Client → LB → Any Server → Redis (shared session store)

Global Load Balancing

DNS-based GSLB:
  US user  → us-east.api.example.com  → US data center
  EU user  → eu-west.api.example.com  → European data center
  Asia user → ap-east.api.example.com → Asian data center

Failover:
  US data center down → DNS routes US users to EU data center

Cloud Implementations

ProviderServiceType
AWSALB / NLB / Global AcceleratorL7 / L4 / Global
GCPCloud Load BalancingL4 / L7 / Global
AzureApplication Gateway / Traffic ManagerL7 / Global
CloudflareLoad BalancingL7 / Global

Anti-Patterns

Anti-PatternConsequenceFix
No health checksTraffic sent to dead serversActive + passive health checks
TCP health check for HTTP serviceServer is up but app is brokenHTTP health check at application level
Session affinity as defaultUneven load, scaling issuesStateless backends, external session store
Single load balancerSingle point of failureActive-passive or active-active LB pair
No connection drainingIn-flight requests dropped during deployGraceful shutdown with draining period

Load balancing is the front door of your application. Its configuration directly determines user-perceived latency, availability during failures, and the effectiveness of your scaling strategy.

Jakub Dimitri Rezayev
Jakub Dimitri Rezayev
Founder & Chief Architect • Garnet Grid Consulting

Jakub holds an M.S. in Customer Intelligence & Analytics and a B.S. in Finance & Computer Science from Pace University. With deep expertise spanning D365 F&O, Azure, Power BI, and AI/ML systems, he architects enterprise solutions that bridge legacy systems and modern technology — and has led multi-million dollar ERP implementations for Fortune 500 supply chains.

View Full Profile →