Verified by Garnet Grid

Change Data Capture (CDC) Patterns

Implement CDC for real-time data synchronization. Covers Debezium, log-based CDC, query-based CDC, outbox pattern, event sourcing, and CDC pipeline architecture.

Change Data Capture (CDC) captures row-level changes (inserts, updates, deletes) from databases and streams them as events. Instead of nightly batch exports that are stale by morning, CDC delivers changes in near real-time — enabling live dashboards, immediate cache invalidation, event-driven microservices, and synchronized data stores.


CDC Methods

MethodHow It WorksLatencyImpact on Source
Log-basedRead database transaction log (WAL/binlog)SecondsMinimal (no queries)
Query-basedPoll tables for changes (timestamp/version)MinutesMedium (queries load)
Trigger-basedDatabase triggers write to change tableImmediateHigh (trigger overhead)
Application-levelApplication emits events on writeImmediateNone on DB (code change)

Debezium (Log-Based CDC)

// Debezium connector configuration for PostgreSQL
{
  "name": "orders-cdc-connector",
  "config": {
    "connector.class": "io.debezium.connector.postgresql.PostgresConnector",
    "database.hostname": "postgres",
    "database.port": "5432",
    "database.user": "cdc_user",
    "database.password": "${CDC_PASSWORD}",
    "database.dbname": "commerce",
    "topic.prefix": "cdc",
    "table.include.list": "public.orders,public.order_items,public.customers",
    "plugin.name": "pgoutput",
    "slot.name": "debezium_orders",
    "publication.name": "dbz_publication",
    "snapshot.mode": "initial",
    "transforms": "route",
    "transforms.route.type": "io.debezium.transforms.ByLogicalTableRouter",
    "transforms.route.topic.regex": "cdc\\.commerce\\.(.*)",
    "transforms.route.topic.replacement": "commerce.$1.events"
  }
}

CDC Event Structure

{
  "schema": { ... },
  "payload": {
    "before": {
      "order_id": "ORD-123",
      "status": "pending",
      "amount": 99.99
    },
    "after": {
      "order_id": "ORD-123",
      "status": "shipped",
      "amount": 99.99
    },
    "source": {
      "version": "2.5.0",
      "connector": "postgresql",
      "name": "cdc",
      "ts_ms": 1709318400000,
      "db": "commerce",
      "schema": "public",
      "table": "orders",
      "txId": 45678,
      "lsn": 123456789
    },
    "op": "u",
    "ts_ms": 1709318400100
  }
}

Transactional Outbox Pattern

For microservices that need to update a database AND publish an event atomically:

# Instead of:
#   1. Update database
#   2. Publish to Kafka  ← Can fail after DB commit!

# Use outbox:
#   1. Update database + insert into outbox table (same transaction)
#   2. CDC reads outbox table and publishes to Kafka

async def create_order(order):
    async with database.transaction():
        # Business operation
        await database.execute(
            "INSERT INTO orders (id, customer_id, amount, status) VALUES ($1, $2, $3, $4)",
            order.id, order.customer_id, order.amount, "pending"
        )
        
        # Outbox entry (same transaction — atomic!)
        await database.execute(
            """INSERT INTO outbox (id, aggregate_type, aggregate_id, event_type, payload)
               VALUES ($1, $2, $3, $4, $5)""",
            uuid4(), "Order", order.id, "OrderCreated",
            json.dumps({"order_id": order.id, "amount": order.amount})
        )
    
    # Debezium CDC picks up the outbox insert and publishes to Kafka
    # No dual-write problem!

CDC Pipeline Architecture

Source Databases
├── PostgreSQL (orders) ──┐
├── MySQL (inventory)  ───┼── Debezium ──▶ Kafka Topics
└── MongoDB (users)    ───┘                   │
                                              ├──▶ Data Warehouse (Snowflake/BQ)
                                              ├──▶ Elasticsearch (search index)
                                              ├──▶ Redis (cache invalidation)
                                              └──▶ Downstream microservices

CDC vs Alternatives

ApproachFreshnessComplexitySource ImpactDeletes Captured
Log-based CDCSecondsMediumMinimal
Batch ETLHoursLowHigh (full scan)❌ (without soft-delete)
Query pollingMinutesLowMedium
Application eventsImmediateHigh (code change)None

Anti-Patterns

Anti-PatternProblemFix
Dual writesUpdate DB + publish event = inconsistency riskOutbox pattern + CDC
Full table scans for syncExpensive, slow, misses deletesLog-based CDC
No schema evolutionCDC events break consumers on schema changeSchema registry + compatibility
No monitoringReplication slot bloat, lag undetectedMonitor replication lag, slot size
Tight couplingConsumers depend on exact DB schemaTransform CDC events into domain events

Checklist

  • CDC method selected (log-based recommended for production)
  • Debezium or equivalent configured with correct plugin
  • Replication slot monitored for lag and disk usage
  • Outbox pattern for services needing atomic DB + event publish
  • Schema evolution strategy for CDC events
  • Dead letter queue for failed event processing
  • Monitoring: CDC lag, connector status, error rates
  • Snapshot strategy: initial load for new consumers
  • Access control: CDC user has minimal required permissions

:::note[Source] This guide is derived from operational intelligence at Garnet Grid Consulting. For CDC consulting, visit garnetgrid.com. :::

Jakub Dimitri Rezayev
Jakub Dimitri Rezayev
Founder & Chief Architect • Garnet Grid Consulting

Jakub holds an M.S. in Customer Intelligence & Analytics and a B.S. in Finance & Computer Science from Pace University. With deep expertise spanning D365 F&O, Azure, Power BI, and AI/ML systems, he architects enterprise solutions that bridge legacy systems and modern technology — and has led multi-million dollar ERP implementations for Fortune 500 supply chains.

View Full Profile →