Every system has a breaking point. Performance testing finds it before your users do. The goal is not to prove your system is fast — it is to discover exactly where, when, and how it fails under load. A system that handles 100 requests per second gracefully but collapses at 150 is more dangerous than one that degrades slowly, because the cliff edge is invisible until you hit it.
| Type | Purpose | Duration | Load Pattern |
|---|
| Load test | Verify expected traffic levels | 10-30 min | Ramp to expected peak |
| Stress test | Find breaking point | 30-60 min | Ramp beyond expected peak |
| Soak test | Find memory leaks, resource exhaustion | 4-24 hours | Sustained moderate load |
| Spike test | Verify sudden traffic burst handling | 5-15 min | Sudden jump, then drop |
| Capacity test | Determine maximum throughput | 30-60 min | Progressive increase until failure |
Load Test Design
// k6 load test example
import http from 'k6/http';
import { check, sleep } from 'k6';
export const options = {
stages: [
{ duration: '2m', target: 50 }, // Ramp up to 50 users
{ duration: '5m', target: 50 }, // Hold at 50 users
{ duration: '2m', target: 200 }, // Ramp up to 200 users
{ duration: '5m', target: 200 }, // Hold at 200 users (expected peak)
{ duration: '2m', target: 400 }, // Stress: double expected peak
{ duration: '5m', target: 400 }, // Hold under stress
{ duration: '2m', target: 0 }, // Ramp down
],
thresholds: {
http_req_duration: ['p(95)<500', 'p(99)<1000'], // 95th < 500ms, 99th < 1s
http_req_failed: ['rate<0.01'], // Error rate < 1%
http_reqs: ['rate>100'], // Throughput > 100 req/s
},
};
export default function () {
// Simulate realistic user behavior
const loginRes = http.post('https://api.example.com/auth/login', JSON.stringify({
email: `user_${__VU}@test.com`,
password: 'testpassword',
}), { headers: { 'Content-Type': 'application/json' } });
check(loginRes, {
'login successful': (r) => r.status === 200,
});
const token = loginRes.json('token');
const headers = { Authorization: `Bearer ${token}`, 'Content-Type': 'application/json' };
// Browse products (most common action)
const products = http.get('https://api.example.com/products?page=1', { headers });
check(products, {
'products loaded': (r) => r.status === 200,
'has results': (r) => r.json('data').length > 0,
});
sleep(Math.random() * 3 + 1); // Think time: 1-4 seconds
// Add to cart (less frequent)
if (Math.random() < 0.3) {
http.post('https://api.example.com/cart', JSON.stringify({
productId: 'prod_001',
quantity: 1,
}), { headers });
}
// Checkout (rare)
if (Math.random() < 0.05) {
http.post('https://api.example.com/checkout', JSON.stringify({
paymentMethod: 'tok_test',
}), { headers });
}
sleep(Math.random() * 2 + 1);
}
| Tool | Language | Protocol | Best For |
|---|
| k6 | JavaScript | HTTP, WebSocket, gRPC | Developer-friendly, CI integration |
| Locust | Python | HTTP | Python teams, custom scenarios |
| Gatling | Scala/Java | HTTP | JVM ecosystems, detailed reports |
| JMeter | Java (GUI) | HTTP, JDBC, JMS | Enterprise, protocol variety |
| Artillery | JavaScript | HTTP, WebSocket | Quick tests, YAML config |
| wrk/wrk2 | C (CLI) | HTTP | Raw throughput benchmarks |
Key Metrics to Measure
| Metric | What It Tells You | Target |
|---|
| Throughput (req/s) | How many requests your system handles | > expected peak × 2 |
| Response time (p50) | Typical user experience | < 200ms |
| Response time (p95) | Experience for most users | < 500ms |
| Response time (p99) | Worst-case common experience | < 1000ms |
| Error rate | Percentage of failed requests | < 0.1% under normal load |
| CPU utilization | Server compute saturation | < 70% at expected peak |
| Memory utilization | Memory pressure | < 80%, no growth over time |
| Database connections | Connection pool exhaustion | < 80% of pool size |
| Queue depth | Backpressure | Not growing under sustained load |
Bottleneck Identification
Systematic approach to finding bottlenecks:
1. Run load test, observe where degradation begins
2. Identify the constraint:
CPU bound? → Profile code, optimize hot paths
Memory bound? → Find memory leaks, reduce allocations
I/O bound? → Optimize queries, add caching, async I/O
Connection bound? → Increase pool size, add connection pooling
Network bound? → Reduce payload size, compression
Database bound? → Add indexes, optimize queries, cache results
3. Fix the constraint, re-test
4. The next bottleneck will appear at a higher load level
5. Repeat until you exceed 2x expected peak
CI Integration
# Run performance test on every release candidate
name: Performance Test
on:
push:
branches: [release/*]
jobs:
load-test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Run k6 load test
uses: grafana/k6-action@v0.3.1
with:
filename: tests/performance/load-test.js
env:
K6_CLOUD_TOKEN: ${{ secrets.K6_CLOUD_TOKEN }}
- name: Check results
run: |
# Fail the pipeline if performance regressed
if [ "$K6_EXIT_CODE" -ne 0 ]; then
echo "Performance test failed: thresholds not met"
exit 1
fi
Anti-Patterns
| Anti-Pattern | Problem | Fix |
|---|
| Testing from the same network | Hides latency issues | Test from external location or cloud |
| Uniform request patterns | Unrealistic (real traffic is bursty) | Add think time, weighted scenarios |
| No baseline | Cannot tell if performance regressed | Establish baseline, run regularly |
| Testing only happy path | Misses error handling performance | Include error scenarios, edge cases |
| One-time performance test | Regressions slip in over time | Run on every release, track trends |
Implementation Checklist