Performance Testing at Scale
Design and execute performance tests that reveal bottlenecks before production load does. Covers load testing, stress testing, soak testing, spike testing, performance baselines, and the tools and patterns for testing at production scale.
Performance testing answers the question: “Can our system handle the load we expect — and what happens when it cannot?” The alternative to performance testing is production incidents that answer the same question more painfully.
Types of Performance Testing
Load Testing:
Expected production load → sustained for duration
"Can we handle 1,000 concurrent users for 1 hour?"
Stress Testing:
Gradually increase beyond capacity → find breaking point
"At what point do we start failing?"
Spike Testing:
Sudden surge of traffic → measure recovery
"What happens during a flash sale?"
Soak Testing:
Expected load → sustained for extended period (24-72 hours)
"Do we have memory leaks or resource exhaustion?"
Breakpoint Testing:
Systematically increase load → find exact threshold
"How many concurrent users until p95 latency > 500ms?"
k6 Load Testing
import http from 'k6/http';
import { check, sleep } from 'k6';
import { htmlReport } from 'https://raw.githubusercontent.com/benc-uk/k6-reporter/main/dist/bundle.js';
export const options = {
stages: [
{ duration: '2m', target: 100 }, // Ramp up
{ duration: '5m', target: 100 }, // Steady state
{ duration: '2m', target: 500 }, // Stress
{ duration: '5m', target: 500 }, // Sustained stress
{ duration: '2m', target: 0 }, // Ramp down
],
thresholds: {
http_req_duration: ['p(95)<500', 'p(99)<1000'], // Latency
http_req_failed: ['rate<0.01'], // Error rate < 1%
http_reqs: ['rate>100'], // Throughput > 100 rps
},
};
export default function () {
// Simulate realistic user flow
const loginRes = http.post('https://api.example.com/login', JSON.stringify({
email: `user${__VU}@example.com`,
password: 'test123'
}), { headers: { 'Content-Type': 'application/json' } });
check(loginRes, {
'login status 200': (r) => r.status === 200,
'login latency < 500ms': (r) => r.timings.duration < 500,
});
const token = loginRes.json().token;
const authHeaders = {
headers: {
'Authorization': `Bearer ${token}`,
'Content-Type': 'application/json'
}
};
// Browse orders
const ordersRes = http.get('https://api.example.com/orders', authHeaders);
check(ordersRes, {
'orders status 200': (r) => r.status === 200,
'orders latency < 300ms': (r) => r.timings.duration < 300,
});
sleep(1); // Think time between actions
// Create order
const orderRes = http.post('https://api.example.com/orders', JSON.stringify({
items: [{ product_id: 'prod_1', quantity: 1 }]
}), authHeaders);
check(orderRes, {
'order created': (r) => r.status === 201,
'order latency < 500ms': (r) => r.timings.duration < 500,
});
sleep(Math.random() * 3); // Random think time
}
export function handleSummary(data) {
return { 'report.html': htmlReport(data) };
}
Performance Baselines
baseline_metrics:
api_latency:
p50: 50ms # Median response time
p95: 200ms # 95th percentile
p99: 500ms # 99th percentile
throughput:
steady_state: 500 rps
peak: 2000 rps
error_rate:
target: < 0.1%
alert: > 1%
resource_utilization:
cpu: < 70% at steady state
memory: < 80% at steady state
database_connections: < 70% of pool
degradation_thresholds:
acceptable: "p95 increases < 50% under 2x load"
concerning: "p95 increases > 100% under 2x load"
critical: "errors appear under normal load"
CI/CD Integration
# Performance gate in deployment pipeline
performance_test:
stage: pre-production
script:
- k6 run --out json=results.json performance/load-test.js
thresholds:
- "p95 latency < baseline * 1.1" # No more than 10% regression
- "error rate < 0.1%"
- "throughput >= baseline * 0.95" # No more than 5% throughput loss
on_failure: block_deployment
schedule:
- on_merge: smoke test (1 minute, 10 users)
- nightly: full load test (30 minutes, production load)
- weekly: soak test (24 hours, production load)
Anti-Patterns
| Anti-Pattern | Consequence | Fix |
|---|---|---|
| Testing against dev environment | Results not representative | Test against production-like infrastructure |
| No think time between requests | Unrealistic thundering herd | Add realistic think time and user flows |
| Testing with 1 user type | Miss slow paths | Test all critical user flows |
| No baseline comparison | Cannot detect regressions | Establish and track baselines |
| Performance testing only before launch | Regressions undetected | Continuous performance testing in CI |
Performance testing is not a one-time activity. It is a continuous practice that catches regressions before they reach production and validates that your system can handle the load it was designed for.