Testing Strategy for Distributed Systems

Testing distributed systems is fundamentally different from testing monoliths. You can’t spin up the entire system in a single process. Services communicate over unreliable networks. Databases have their own state. Third-party APIs are unpredictable. The testing strategy must account for all of this without requiring every team to run every service locally.

Testing Pyramid for Distributed Systems

          ┌─────────────┐
          │ E2E / Smoke │  Minimal: critical user flows only
          │    (5%)     │  Run in staging, not per-PR
          ├─────────────┤
          │  Contract   │  Verify API contracts between services
          │   Tests     │  Run per-PR, fast, no network
          │   (15%)     │
          ├─────────────┤
          │ Integration │  Service + its direct dependencies
          │   Tests     │  (DB, cache, queue via testcontainers)
          │   (30%)     │
          ├─────────────┤
          │ Unit Tests  │  Business logic, pure functions
          │   (50%)     │  Fast, no I/O, deterministic
          └─────────────┘

Contract Testing with Pact

# Consumer side (Order Service expects Inventory API)
from pact import Consumer, Provider

pact = Consumer('OrderService').has_pact_with(Provider('InventoryService'))

pact.given(
    'product PROD-001 has 50 units in stock'
).upon_receiving(
    'a request to check inventory'
).with_request(
    method='GET',
    path='/inventory/PROD-001'
).will_respond_with(
    status=200,
    body={
        'product_id': 'PROD-001',
        'available': 50,
        'reserved': 5
    }
)

# The contract (Pact file) is shared with the Inventory team
# They verify their API satisfies the contract

Integration Testing with Testcontainers

import testcontainers.postgres
import testcontainers.kafka

def test_order_creation_with_real_db():
    """Integration test with real PostgreSQL."""
    with testcontainers.postgres.PostgresContainer("postgres:16") as postgres:
        db = connect(postgres.get_connection_url())
        
        # Run migrations
        run_migrations(db)
        
        # Test business logic with real database
        order = create_order(db, customer_id=1, items=[
            {"product_id": "PROD-001", "quantity": 2}
        ])
        
        assert order.status == "pending"
        assert order.total == 49.98
        
        # Verify data persisted correctly
        saved = get_order(db, order.id)
        assert saved.items[0].quantity == 2

Test Environments

Environment	Purpose	Data	Refresh
Local	Developer testing	Fixtures + testcontainers	Instant
CI	Automated tests per PR	Fixtures + testcontainers	Per build
Staging	Integration + E2E tests	Sanitized production snapshot	Weekly
Preview	Per-PR environments (optional)	Seed data	Per PR
Production	Canary testing, feature flags	Real data	N/A

Anti-Patterns

Anti-Pattern	Problem	Fix
Shared test database	Tests interfere with each other	Testcontainers, per-test schemas
E2E for everything	Slow, flaky, expensive	Minimal E2E, maximize unit + contract
Mocking everything	Tests pass but integration fails	Test with real dependencies (testcontainers)
Production data in tests	Privacy risk, non-deterministic	Synthetic test data, anonymized snapshots
No contract tests	Breaking API changes surprise consumers	Pact/OpenAPI contract verification

Checklist

Test pyramid followed (50% unit, 30% integration, 15% contract, 5% E2E)
Contract tests between all service boundaries
Integration tests use testcontainers (not shared databases)
Test data: synthetic, deterministic, no production PII
CI runs all tests per PR (< 15 min)
Staging environment for E2E and smoke tests
Flaky test tracking and quarantine process
Test coverage on critical business logic paths

:::note[Source] This guide is derived from operational intelligence at Garnet Grid Consulting. For testing strategy consulting, visit garnetgrid.com. :::

Testing Pyramid for Distributed Systems

Contract Testing with Pact

Integration Testing with Testcontainers

Test Environments

Anti-Patterns

Checklist

More in Software Engineering

How to Implement API-First Architecture: Design, Versioning, and Testing

API Rate Limiting & Throttling

API Versioning & Lifecycle Management