How to Implement Event-Driven Architecture

Event-driven architecture decouples services by communicating through events instead of direct calls. It enables scalability, resilience, and real-time processing — but only if you handle the complexity correctly. The difference between a well-implemented event-driven system and a poorly implemented one is the difference between a resilient, scalable platform and a debugging nightmare where events are lost, duplicated, or processed out of order.

This guide covers when to use event-driven architecture, how to choose a message broker, the patterns that make it work (event sourcing, CQRS, sagas), and the pitfalls that sink most implementations.

When to Use Event-Driven

✅ Good Fit

Scenario	Example	Why EDA Works
Decoupled microservices	Order placed → inventory, shipping, email all react independently	Services evolve independently
Real-time data processing	User action → analytics, recommendations update instantly	Low-latency, high-throughput
Audit trail / compliance	Every state change persisted as an immutable event	Complete history, regulatory compliance
Async workloads	File uploaded → resize, scan, optimize in background	Non-blocking, parallelizable
Multi-team systems	Team A produces events, Team B consumes without coordination	Loose coupling, team autonomy

❌ Bad Fit

Scenario	Why Not	Better Approach
Simple CRUD apps	Over-engineering for 10 endpoints	Direct API calls
Synchronous queries	User expects instant response, not “processing”	Request-response pattern
Strong consistency required	Events are eventually consistent by nature	Database transactions
Small team (< 5 devs)	Operational overhead exceeds value	Modular monolith
Prototyping / MVPs	Adds complexity when you need speed	Direct function calls

Core Patterns

Event Sourcing

Store events as the source of truth instead of current state. Current state is derived by replaying events.

# Store events, derive state

class OrderEventStore:
    def __init__(self):
        self.events = []

    def append(self, event):
        event["timestamp"] = datetime.utcnow().isoformat()
        event["version"] = len(self.events) + 1
        self.events.append(event)

    def get_state(self, order_id):
        """Rebuild current state from events"""
        state = {"status": "unknown", "items": [], "total": 0}

        for event in self.events:
            if event["order_id"] != order_id:
                continue

            if event["type"] == "OrderCreated":
                state["status"] = "created"
                state["customer"] = event["customer_id"]
            elif event["type"] == "ItemAdded":
                state["items"].append(event["item"])
                state["total"] += event["price"]
            elif event["type"] == "OrderPaid":
                state["status"] = "paid"
            elif event["type"] == "OrderShipped":
                state["status"] = "shipped"

        return state

:::tip[When to Use Event Sourcing] Use event sourcing when you need a complete audit trail, the ability to rebuild state at any point in time, or debugging complex workflows by replaying events. Don’t use it for simple CRUD — the overhead isn’t worth it. :::

CQRS (Command Query Responsibility Segregation)

Separate the write path (commands) from the read path (queries). Each is optimized independently.

Commands (Write)              Events              Queries (Read)
┌──────────────┐     ┌────────────────┐     ┌──────────────┐
│ CreateOrder  │────▶│ OrderCreated   │────▶│ Order List   │
│ AddItem      │     │ ItemAdded      │     │ (Denormalized│
│ Pay          │     │ OrderPaid      │     │  read model) │
│ Ship         │     │ OrderShipped   │     │              │
└──────────────┘     └────────────────┘     └──────────────┘
  (Write DB)           (Event Store)          (Read DB)
  Normalized           Append-only            Optimized for
  for writes           immutable              queries

When to Combine CQRS + Event Sourcing

Pattern	Complexity	Use When
Simple CRUD	Low	< 10K users, simple domain
CQRS only (no event sourcing)	Medium	Different read/write patterns, high read volume
Event sourcing only (no CQRS)	Medium	Audit trail needed, simple queries
CQRS + Event sourcing	High	Financial systems, complex domains, regulatory requirements

Step 1: Choose a Message Broker

Broker	Throughput	Ordering	Retention	Best For	Managed Options
Apache Kafka	Very High	Partition-level	Days to forever	High-volume streaming, event sourcing	Confluent, MSK, Event Hubs
AWS SQS	High	FIFO available	14 days max	Simple task queuing, decoupling	AWS-native
RabbitMQ	Medium-High	Per-queue	Until consumed	Complex routing (fanout, topic, headers)	CloudAMQP
AWS EventBridge	Medium	No guarantee	24 hours	AWS event routing, serverless	AWS-native
Redis Streams	Very High	Per-stream	Configurable	Low-latency, simple streaming	ElastiCache
NATS	Very High	Per-subject	Configurable	Cloud-native, lightweight	Synadia

Selection Criteria

Need to replay events from days/weeks ago?
├── Yes → Kafka (configurable retention)
└── No
    ├── Need complex routing (fanout, topic exchanges)?
    │   └── RabbitMQ
    ├── Simple task queue with at-least-once delivery?
    │   └── SQS (simplest to operate)
    ├── Ultralow latency (< 1ms)?
    │   └── Redis Streams or NATS
    └── AWS-native serverless integration?
        └── EventBridge

Step 2: Design Event Schemas

Events should be self-describing, versioned, and backward-compatible.

{
  "event_id": "evt-550e8400-e29b",
  "event_type": "OrderCreated",
  "event_version": "1.2",
  "timestamp": "2025-02-15T14:30:00Z",
  "source": "order-service",
  "correlation_id": "req-abc-123",
  "data": {
    "order_id": "ord-789",
    "customer_id": "cust-456",
    "items": [
      {"product_id": "prod-1", "quantity": 2, "price": 29.99}
    ],
    "total": 59.98
  },
  "metadata": {
    "user_agent": "web",
    "ip_address": "192.168.1.1"
  }
}

Schema Evolution Rules

Change Type	Backward Compatible?	Example
Add optional field	✅ Yes	Add `discount_code` to order event
Add required field	❌ No	Add `shipping_address` (existing events don’t have it)
Remove field	❌ No	Remove `customer_email`
Rename field	❌ No	Rename `total` to `order_total`
Change field type	❌ No	Change `price` from int to string

Step 3: Implement Idempotent Consumers

# Every event consumer MUST be idempotent
# (processing the same event twice must produce the same result)

class IdempotentConsumer:
    def __init__(self, db):
        self.db = db

    def process(self, event):
        event_id = event["id"]

        # Check if already processed (deduplication)
        if self.db.execute(
            "SELECT 1 FROM processed_events WHERE event_id = %s",
            (event_id,)
        ).fetchone():
            print(f"Event {event_id} already processed, skipping")
            return

        # Process the event
        self._handle(event)

        # Mark as processed (same transaction as the business logic!)
        self.db.execute(
            "INSERT INTO processed_events (event_id, processed_at) VALUES (%s, NOW())",
            (event_id,)
        )
        self.db.commit()

    def _handle(self, event):
        if event["type"] == "OrderPaid":
            # Send confirmation email, update inventory, etc.
            pass

Step 4: Saga Pattern for Distributed Transactions

When a business process spans multiple services, use sagas instead of distributed transactions.

# Orchestration saga for order processing
class OrderSaga:
    steps = [
        {"action": "reserve_inventory", "compensate": "release_inventory"},
        {"action": "charge_payment", "compensate": "refund_payment"},
        {"action": "create_shipment", "compensate": "cancel_shipment"},
        {"action": "send_confirmation", "compensate": "send_cancellation"},
    ]

    async def execute(self, order):
        completed = []
        for step in self.steps:
            try:
                await getattr(self, step["action"])(order)
                completed.append(step)
            except Exception as e:
                # Compensate in reverse order
                for comp_step in reversed(completed):
                    await getattr(self, comp_step["compensate"])(order)
                raise SagaFailed(f"Step {step['action']} failed: {e}")

Orchestration vs Choreography

Approach	How It Works	Pros	Cons
Orchestration	Central coordinator tells each service what to do	Easy to understand, single source of truth	Single point of failure, coupling
Choreography	Each service reacts to events and publishes its own	Decoupled, no central coordinator	Hard to debug, implicit flow

Rule of thumb: Use orchestration for < 5 steps, choreography for highly decoupled, independently evolving services.

Common Pitfalls

Pitfall	Problem	Solution
No idempotency	Duplicate charges, double emails	Deduplication by event ID (idempotency key)
No dead letter queue	Failed messages silently lost	Configure DLQ on every queue, alert on DLQ depth
No schema versioning	New event version breaks all consumers	Schema registry (Confluent, Protobuf, Avro)
Eventual consistency surprise	Users see stale data and file support tickets	UI shows “processing” state, polling for updates
Event ordering assumptions	Payment processed before order created	Use partition keys (same entity → same partition)
Unbounded event size	Large payloads slow down the broker	Store data in S3/DB, put reference in event
Missing correlation IDs	Can’t trace a request across services	Include `correlation_id` in every event
Chat-based debugging	”What events did order X produce?” is unanswerable	Build event catalog and search tooling

Monitoring Event-Driven Systems

Metric	What to Watch	Alert Threshold
Consumer lag	Messages behind real-time	> 1,000 messages or > 5 minutes
DLQ depth	Failed messages accumulating	> 0 (investigate every failure)
Processing latency	Time from event publish to consumer processing	> P95 SLA
Throughput	Events/second per topic	Sudden drop (> 50% from baseline)
Consumer group health	Active consumers per group	Fewer consumers than expected

Event-Driven Architecture Checklist

:::note[Source] This guide is derived from operational intelligence at Garnet Grid Consulting. For architecture consulting, visit garnetgrid.com. :::