Verified by Garnet Grid

Database Migration Strategies

Execute zero-downtime database migrations. Covers schema evolution, expand-contract pattern, online DDL, data migration, rollback strategies, and migration testing.

Database migrations are the highest-risk operations in software engineering. A bad migration can lock tables, corrupt data, or take down production. The safe approach uses the expand-contract pattern: add new structures alongside old ones, migrate data gradually, then remove the old structures — all without downtime.


Migration Risk Levels

OperationRiskDowntime?Strategy
Add nullable columnLowNoSimple ALTER TABLE
Add column with defaultLowNo (Postgres 11+)ALTER TABLE ADD COLUMN DEFAULT
Add NOT NULL columnMediumNo (with steps)Add nullable → backfill → add constraint
Rename columnHighPotentialExpand-contract (add new, copy, drop old)
Change column typeHighPotentialExpand-contract
Drop columnMediumNoEnsure no application code references it
Add indexMediumNoCREATE INDEX CONCURRENTLY
Drop tableHighNoVerify zero references first

Expand-Contract Pattern

Phase 1: EXPAND (deploy new schema alongside old)
┌────────────┐     ┌────────────┐
│ Old Schema │     │ New Column  │
│ user_name  │ + → │ first_name │
│            │     │ last_name  │
└────────────┘     └────────────┘
Application writes to BOTH old and new columns

Phase 2: MIGRATE (copy data, switch reads)
Application reads from new columns
Backfill script copies old data to new columns
Verify data consistency

Phase 3: CONTRACT (remove old schema)
Application no longer references old column
DROP old column in separate migration

Implementation

-- Phase 1: Expand (Migration 1)
ALTER TABLE users ADD COLUMN first_name TEXT;
ALTER TABLE users ADD COLUMN last_name TEXT;

-- Phase 2: Backfill (Migration 2 - run in batches)
UPDATE users
SET first_name = split_part(user_name, ' ', 1),
    last_name = split_part(user_name, ' ', 2)
WHERE first_name IS NULL
LIMIT 10000;  -- Process in batches

-- Phase 3: Contract (Migration 3 - after app fully migrated)
ALTER TABLE users DROP COLUMN user_name;
ALTER TABLE users ALTER COLUMN first_name SET NOT NULL;
ALTER TABLE users ALTER COLUMN last_name SET NOT NULL;

Zero-Downtime Index Creation

-- WRONG: Locks table for entire duration
CREATE INDEX idx_orders_customer ON orders(customer_id);

-- RIGHT: Non-blocking (takes longer but no lock)
CREATE INDEX CONCURRENTLY idx_orders_customer ON orders(customer_id);

-- For large tables, monitor progress
SELECT phase, blocks_total, blocks_done,
       round(100.0 * blocks_done / nullif(blocks_total, 0), 1) as pct
FROM pg_stat_progress_create_index;

Migration Testing

def test_migration_up_down():
    """Verify migration is reversible."""
    # Apply migration
    run_migration("up", version="20250315_add_first_last_name")
    
    # Verify schema
    columns = get_table_columns("users")
    assert "first_name" in columns
    assert "last_name" in columns
    
    # Verify data preserved
    user = query_one("SELECT * FROM users WHERE id = 1")
    assert user.first_name is not None
    
    # Roll back
    run_migration("down", version="20250315_add_first_last_name")
    
    # Verify rollback
    columns = get_table_columns("users")
    assert "first_name" not in columns

Anti-Patterns

Anti-PatternProblemFix
Rename column directlyBreaks all application code referencing old nameExpand-contract pattern
Long-running migration in transactionLocks table for minutes/hoursBatch updates, CREATE INDEX CONCURRENTLY
No rollback planMigration fails, stuck in broken stateEvery migration has an “undo” script
Schema and data in one migrationToo much risk in one stepSeparate DDL and data migrations
Testing in production onlyBreaking changes discovered in productionTest against production-sized data in staging

Checklist

  • Migration reviewed for lock duration and table impact
  • Expand-contract for breaking schema changes
  • No table-locking DDL during business hours
  • CREATE INDEX CONCURRENTLY for all index additions
  • Batch processing for large data backfills
  • Rollback script tested for every migration
  • Staging test with production-sized data before production
  • Application compatibility with both old and new schema

:::note[Source] This guide is derived from operational intelligence at Garnet Grid Consulting. For database migration consulting, visit garnetgrid.com. :::

Jakub Dimitri Rezayev
Jakub Dimitri Rezayev
Founder & Chief Architect • Garnet Grid Consulting

Jakub holds an M.S. in Customer Intelligence & Analytics and a B.S. in Finance & Computer Science from Pace University. With deep expertise spanning D365 F&O, Azure, Power BI, and AI/ML systems, he architects enterprise solutions that bridge legacy systems and modern technology — and has led multi-million dollar ERP implementations for Fortune 500 supply chains.

View Full Profile →