Load Testing: Knowing Your Breaking Point Before Production Does

Load testing answers one question: what happens to your system under pressure? Not what you think happens, not what the architecture diagram says should happen, but what actually happens when 10,000 users hit your checkout flow simultaneously at 2 PM on Black Friday.

Most teams discover their breaking point in production. Load testing discovers it in a controlled environment where the consequences are a report — not an outage.

Types of Load Tests

Smoke Test

Minimal load to verify the test setup works:

Users: 1-5
Duration: 1-2 minutes
Purpose: Validate test scripts, endpoints, authentication

Load Test

Expected peak traffic to verify performance meets SLAs:

Users: Expected peak concurrent users
Duration: 30-60 minutes
Purpose: Verify latency, throughput, error rate under normal peak
Success: P99 < 500ms, error rate < 0.1%

Stress Test

Beyond expected peak to find the breaking point:

Users: 2x-5x expected peak, ramped gradually
Duration: Until failure or 60 minutes
Purpose: Find where the system degrades and how it fails

Soak Test

Sustained load over hours to find memory leaks and resource exhaustion:

Users: 70% of peak
Duration: 4-12 hours
Purpose: Memory leaks, connection exhaustion, log rotation, disk fill

Spike Test

Sudden traffic burst to test auto-scaling and circuit breakers:

Users: 0 → 10x peak → 0 in 60 seconds
Purpose: Auto-scaling response time, queue overflow, connection storms

Designing Realistic Workloads

The most common load testing mistake is testing the wrong thing. A load test that hammers /api/health at 50,000 RPS tells you nothing about your checkout flow.

Traffic Analysis

Start with production traffic patterns:

Real traffic distribution:
  GET  /api/products         35%   (browse catalog)
  GET  /api/products/:id     25%   (view product)
  POST /api/cart/items        15%   (add to cart)
  GET  /api/cart              10%   (view cart)
  POST /api/orders             8%   (checkout)
  POST /api/auth/login         5%   (login)
  Other                        2%

User Journeys

Model complete user flows, not individual endpoints:

// k6 user scenario
export default function() {
  // 1. Browse products (think time: 3-5s)
  let products = http.get(`${BASE}/api/products`);
  sleep(randomIntBetween(3, 5));
  
  // 2. View product detail
  let productId = products.json('data.0.id');
  http.get(`${BASE}/api/products/${productId}`);
  sleep(randomIntBetween(2, 4));
  
  // 3. Add to cart
  http.post(`${BASE}/api/cart/items`, JSON.stringify({
    productId: productId,
    quantity: 1
  }), { headers: { 'Content-Type': 'application/json' } });
  sleep(randomIntBetween(1, 2));
  
  // 4. Checkout (20% of users who add to cart)
  if (Math.random() < 0.2) {
    http.post(`${BASE}/api/orders`, JSON.stringify({
      paymentMethod: 'card'
    }));
  }
}

Think Time

Real users pause between actions. Without think time, your test generates 10x more traffic per virtual user than reality, which distorts results.

Tool Selection

Tool	Language	Strengths	Scale
k6	JavaScript	Developer-friendly, CI integration, cloud option	100K+ VUs
Locust	Python	Pythonic, distributed, real-time UI	50K+ VUs
Gatling	Scala/Java	JVM performance, detailed reports	100K+ VUs
Artillery	YAML/JS	Simple config, serverless option	10K+ VUs
JMeter	GUI/XML	Feature-rich, legacy standard	10K+ VUs

k6 Example

import http from 'k6/http';
import { check, sleep } from 'k6';

export const options = {
  stages: [
    { duration: '2m', target: 100 },   // Ramp up
    { duration: '5m', target: 100 },   // Steady state
    { duration: '2m', target: 500 },   // Stress
    { duration: '5m', target: 500 },   // Sustained stress
    { duration: '2m', target: 0 },     // Ramp down
  ],
  thresholds: {
    http_req_duration: ['p(99)<1000'],  // 99% of requests under 1s
    http_req_failed: ['rate<0.01'],     // Less than 1% failure rate
  },
};

export default function() {
  const res = http.get('https://api.example.com/orders');
  check(res, {
    'status is 200': (r) => r.status === 200,
    'response time < 500ms': (r) => r.timings.duration < 500,
  });
  sleep(1);
}

Interpreting Results

Key Metrics

Throughput:    2,847 req/s (target: 2,000)     ✅
P50 latency:  45ms                             ✅
P95 latency:  180ms                            ✅
P99 latency:  890ms (target: <1000ms)          ✅ (barely)
Error rate:   0.02%                            ✅

Warning Signs

Latency increases linearly with load: Normal until saturation, then exponential growth indicates a bottleneck
P99 >> P50: High tail latency suggests queuing or contention
Error rate spikes at specific load: Capacity limit hit (connection pool, thread pool, database connections)
Throughput plateaus while latency climbs: System is saturated — adding more load makes it worse

Finding Bottlenecks

During load test, monitor:
  CPU utilization     → per service, per database
  Memory usage        → growing = possible leak
  Connection pools    → database, HTTP client, Redis
  Queue depths        → message queues, thread pools
  Disk I/O            → database write-ahead log, log files
  Network bandwidth   → cross-AZ traffic, NAT gateway

Continuous Load Testing

In CI/CD Pipeline

Run load tests on every PR that changes performance-critical code:

# GitHub Actions
- name: Load Test
  run: |
    k6 run --out json=results.json tests/load/checkout.js
    
- name: Check Thresholds
  run: |
    python scripts/check_load_results.py results.json
    # Fails if P99 > 500ms or error rate > 0.5%

Baseline Comparison

Compare every test run against a baseline:

Baseline (v2.3.0):  P99 = 340ms,  throughput = 2,847 req/s
Current  (v2.4.0):  P99 = 890ms,  throughput = 2,102 req/s
Regression:         P99 +161%,    throughput -26%    ← FAIL

Anti-Patterns

Anti-Pattern	Consequence	Fix
Testing only happy paths	Failures under load are undiscovered	Include error scenarios, auth failures, timeout paths
No think time	Unrealistically high request rate per user	Add realistic pauses between actions
Testing against shared staging	Results vary based on other activity	Dedicated load test environment
Running from a single location	Tests your test machine, not your system	Distribute load generators across regions
Load testing once before launch	Performance degrades silently	Continuous load testing in CI

Load testing is not a phase — it is a practice. Systems that are never load tested will surprise you. Systems that are continuously load tested will not.