Verified by Garnet Grid

Database Backup & Recovery Engineering

Design database backup strategies. Covers backup types, point-in-time recovery, RPO/RTO, WAL archiving, cross-region replication, backup testing, and disaster recovery for databases.

Database backups are like insurance: everyone knows they need them, nobody checks if they work until it’s too late. The most dangerous assumption in database engineering is “our backups are good.” Untested backups are not backups — they’re hope. This guide covers how to design backup strategies that actually work when you need them.


Backup Types

TypeSpeedSizeRecoveryBest For
Full backupSlowLargeFast (self-contained)Weekly baseline
IncrementalFastSmallSlow (needs full + all increments)Daily between fulls
DifferentialMediumMediumMedium (needs full + latest diff)Daily alternative to incremental
Continuous (WAL/binlog)ContinuousSmallPoint-in-time recoveryProduction databases
Logical (pg_dump)SlowMediumObject-level restoreSchema migration, selective restore

PostgreSQL Backup Architecture

                    Continuous WAL Archiving
PostgreSQL ──── WAL files ──────▶ S3/GCS
     │                              │
     │                              ▼
     │                    Point-in-time recovery
     │                    (restore to any second)

     ├── pg_basebackup (weekly)
     │   └── Full physical backup → S3

     └── pg_dump (daily, critical schemas)
         └── Logical backup → S3

WAL Archiving Configuration

# postgresql.conf
wal_level = replica
archive_mode = on
archive_command = 'aws s3 cp %p s3://backups/wal/%f'
archive_timeout = 60  # Archive every 60 seconds minimum

RPO / RTO Planning

TierDataRPORTOStrategy
CriticalFinancial, orders0 (zero data loss)< 1 hourSynchronous replication + WAL archiving
ImportantUser data, content< 1 hour< 4 hoursAsync replication + hourly backups
StandardAnalytics, logs< 24 hours< 24 hoursDaily backups
ArchivalHistorical data< 1 week< 1 weekWeekly backups to cold storage

Backup Testing

#!/bin/bash
# Monthly backup restore test

# 1. Restore latest backup to test environment
pg_restore --dbname=restore_test latest_backup.dump

# 2. Run consistency checks
psql restore_test -c "
  SELECT table_name, 
         (xpath('/row/cnt/text()', 
          query_to_xml('SELECT count(*) as cnt FROM ' || table_name, false, false, ''))
         )[1]::text::int as row_count
  FROM information_schema.tables 
  WHERE table_schema = 'public'
  ORDER BY table_name;
"

# 3. Compare row counts with production
# 4. Run application smoke tests against restored database
# 5. Log results and alert on discrepancies

Anti-Patterns

Anti-PatternProblemFix
Never testing restoresDiscover backup is corrupted during disasterMonthly restore tests, automated validation
Backups on same serverServer failure = data AND backup lostOff-site backups (different region/cloud)
No point-in-time recoveryCan only restore to last backup (hours ago)WAL archiving for continuous recovery
No backup monitoringBackup silently fails for weeksAlert on backup failures and missing backups
Logical backup onlySlow restore for large databasesPhysical backups (pg_basebackup) for speed

Checklist

  • Backup strategy defined per data tier (RPO/RTO)
  • WAL archiving configured for point-in-time recovery
  • Physical backups (pg_basebackup) weekly, offsite
  • Logical backups (pg_dump) for critical schemas
  • Cross-region backup replication
  • Backup encryption at rest
  • Restore testing: monthly, automated
  • Backup monitoring: alerts on failure or missed schedule
  • Retention policy: how long to keep backups
  • Recovery runbook documented and tested

:::note[Source] This guide is derived from operational intelligence at Garnet Grid Consulting. For database consulting, visit garnetgrid.com. :::

Jakub Dimitri Rezayev
Jakub Dimitri Rezayev
Founder & Chief Architect • Garnet Grid Consulting

Jakub holds an M.S. in Customer Intelligence & Analytics and a B.S. in Finance & Computer Science from Pace University. With deep expertise spanning D365 F&O, Azure, Power BI, and AI/ML systems, he architects enterprise solutions that bridge legacy systems and modern technology — and has led multi-million dollar ERP implementations for Fortune 500 supply chains.

View Full Profile →