Testing Strategy for Distributed Systems
Test distributed microservice architectures. Covers contract testing, integration testing, chaos testing, test environments, test data management, and testing in production safely.
Testing distributed systems is fundamentally different from testing monoliths. You can’t spin up the entire system in a single process. Services communicate over unreliable networks. Databases have their own state. Third-party APIs are unpredictable. The testing strategy must account for all of this without requiring every team to run every service locally.
Testing Pyramid for Distributed Systems
┌─────────────┐
│ E2E / Smoke │ Minimal: critical user flows only
│ (5%) │ Run in staging, not per-PR
├─────────────┤
│ Contract │ Verify API contracts between services
│ Tests │ Run per-PR, fast, no network
│ (15%) │
├─────────────┤
│ Integration │ Service + its direct dependencies
│ Tests │ (DB, cache, queue via testcontainers)
│ (30%) │
├─────────────┤
│ Unit Tests │ Business logic, pure functions
│ (50%) │ Fast, no I/O, deterministic
└─────────────┘
Contract Testing with Pact
# Consumer side (Order Service expects Inventory API)
from pact import Consumer, Provider
pact = Consumer('OrderService').has_pact_with(Provider('InventoryService'))
pact.given(
'product PROD-001 has 50 units in stock'
).upon_receiving(
'a request to check inventory'
).with_request(
method='GET',
path='/inventory/PROD-001'
).will_respond_with(
status=200,
body={
'product_id': 'PROD-001',
'available': 50,
'reserved': 5
}
)
# The contract (Pact file) is shared with the Inventory team
# They verify their API satisfies the contract
Integration Testing with Testcontainers
import testcontainers.postgres
import testcontainers.kafka
def test_order_creation_with_real_db():
"""Integration test with real PostgreSQL."""
with testcontainers.postgres.PostgresContainer("postgres:16") as postgres:
db = connect(postgres.get_connection_url())
# Run migrations
run_migrations(db)
# Test business logic with real database
order = create_order(db, customer_id=1, items=[
{"product_id": "PROD-001", "quantity": 2}
])
assert order.status == "pending"
assert order.total == 49.98
# Verify data persisted correctly
saved = get_order(db, order.id)
assert saved.items[0].quantity == 2
Test Environments
| Environment | Purpose | Data | Refresh |
|---|---|---|---|
| Local | Developer testing | Fixtures + testcontainers | Instant |
| CI | Automated tests per PR | Fixtures + testcontainers | Per build |
| Staging | Integration + E2E tests | Sanitized production snapshot | Weekly |
| Preview | Per-PR environments (optional) | Seed data | Per PR |
| Production | Canary testing, feature flags | Real data | N/A |
Anti-Patterns
| Anti-Pattern | Problem | Fix |
|---|---|---|
| Shared test database | Tests interfere with each other | Testcontainers, per-test schemas |
| E2E for everything | Slow, flaky, expensive | Minimal E2E, maximize unit + contract |
| Mocking everything | Tests pass but integration fails | Test with real dependencies (testcontainers) |
| Production data in tests | Privacy risk, non-deterministic | Synthetic test data, anonymized snapshots |
| No contract tests | Breaking API changes surprise consumers | Pact/OpenAPI contract verification |
Checklist
- Test pyramid followed (50% unit, 30% integration, 15% contract, 5% E2E)
- Contract tests between all service boundaries
- Integration tests use testcontainers (not shared databases)
- Test data: synthetic, deterministic, no production PII
- CI runs all tests per PR (< 15 min)
- Staging environment for E2E and smoke tests
- Flaky test tracking and quarantine process
- Test coverage on critical business logic paths
:::note[Source] This guide is derived from operational intelligence at Garnet Grid Consulting. For testing strategy consulting, visit garnetgrid.com. :::