Database Replication Topologies
Design and operate database replication for high availability, disaster recovery, and read scaling. Covers single-leader, multi-leader, and leaderless replication, replication lag, conflict resolution, and the patterns that keep data consistent across replicas.
Database replication copies data from one server to others to achieve high availability, disaster recovery, and read scalability. But replication introduces complexity: lag, conflicts, split-brain scenarios, and consistency tradeoffs. Understanding replication topologies is essential for designing reliable data systems.
Replication Topologies
Single-Leader (Primary-Replica):
┌──────────┐ Replication ┌──────────┐
│ Primary │ ────────────────→ │ Replica 1│ (read-only)
│ (writes) │ ────────────────→ │ Replica 2│ (read-only)
└──────────┘ └──────────┘
Writes: Primary only
Reads: Primary + Replicas
Failover: Promote a replica to primary
Best for: Read-heavy workloads, simple architecture
Multi-Leader (Active-Active):
┌──────────┐ ←→ ┌──────────┐
│ Leader 1 │ │ Leader 2 │
│ (US-East)│ ←──────→ │ (EU-West)│
└──────────┘ └──────────┘
Writes: Any leader
Conflict: Same row modified on both leaders simultaneously
Resolution: Last-write-wins, custom merge, CRDT
Best for: Multi-region, offline-capable apps
Leaderless (Quorum):
┌──────────┐ ┌──────────┐ ┌──────────┐
│ Node 1 │ │ Node 2 │ │ Node 3 │
└──────────┘ └──────────┘ └──────────┘
Write: To W of N nodes (e.g., 2 of 3)
Read: From R of N nodes (e.g., 2 of 3)
Rule: W + R > N guarantees overlap (latest value)
Best for: High availability, partition tolerance
Replication Lag
# Monitoring replication lag
# PostgreSQL: Check replication lag
"""
SELECT
client_addr,
state,
sent_lsn,
write_lsn,
flush_lsn,
replay_lsn,
pg_wal_lsn_diff(sent_lsn, replay_lsn) AS replication_lag_bytes,
EXTRACT(EPOCH FROM (NOW() - pg_last_xact_replay_timestamp())) AS lag_seconds
FROM pg_stat_replication;
"""
# Read-your-writes consistency pattern
class ReadYourWritesRouter:
"""Ensure user reads their own recent writes."""
def route_read(self, user_id: str, query: str):
last_write = self.get_last_write_time(user_id)
replica_lag = self.get_replica_lag()
if last_write and replica_lag > 0:
time_since_write = time.time() - last_write
if time_since_write < replica_lag + 1:
# Recent write — read from primary
return self.primary.execute(query)
# Safe to read from replica
return self.replica.execute(query)
Failover Strategies
Automatic Failover (Managed Services):
AWS RDS: Automatic failover in ~60-120 seconds
Azure SQL: Automatic failover groups with DNS redirect
Cloud SQL: Regional HA with automatic failover
Manual Failover (Self-Managed):
1. Detect primary failure (health checks, heartbeat)
2. Verify primary is truly down (avoid split-brain)
3. Select promotion candidate (most up-to-date replica)
4. Promote replica to primary
5. Redirect application connections
6. Reconfigure remaining replicas to follow new primary
Risk: Data loss if async replication and primary failed before
replicas caught up.
Mitigation: Synchronous replication (zero data loss, higher latency)
Anti-Patterns
| Anti-Pattern | Consequence | Fix |
|---|---|---|
| Ignore replication lag | Stale reads, user confusion | Monitor lag, route critical reads to primary |
| No failover testing | First failover is during an outage | Regular failover drills |
| Multi-leader without conflict resolution | Data corruption, lost writes | Explicit conflict resolution strategy |
| Sync replication everywhere | High latency, reduced availability | Sync for critical data, async for the rest |
| No monitoring of replica health | Silent replica failure | Alert on lag, connection status, query errors |
Replication is the foundation of high availability, but it introduces complexity that must be actively managed. Choose the simplest topology that meets your requirements, and always monitor replication health.