Database Migration Strategies
Execute zero-downtime database migrations. Covers schema evolution, expand-contract pattern, online DDL, data migration, rollback strategies, and migration testing.
Database migrations are the highest-risk operations in software engineering. A bad migration can lock tables, corrupt data, or take down production. The safe approach uses the expand-contract pattern: add new structures alongside old ones, migrate data gradually, then remove the old structures — all without downtime.
Migration Risk Levels
| Operation | Risk | Downtime? | Strategy |
|---|---|---|---|
| Add nullable column | Low | No | Simple ALTER TABLE |
| Add column with default | Low | No (Postgres 11+) | ALTER TABLE ADD COLUMN DEFAULT |
| Add NOT NULL column | Medium | No (with steps) | Add nullable → backfill → add constraint |
| Rename column | High | Potential | Expand-contract (add new, copy, drop old) |
| Change column type | High | Potential | Expand-contract |
| Drop column | Medium | No | Ensure no application code references it |
| Add index | Medium | No | CREATE INDEX CONCURRENTLY |
| Drop table | High | No | Verify zero references first |
Expand-Contract Pattern
Phase 1: EXPAND (deploy new schema alongside old)
┌────────────┐ ┌────────────┐
│ Old Schema │ │ New Column │
│ user_name │ + → │ first_name │
│ │ │ last_name │
└────────────┘ └────────────┘
Application writes to BOTH old and new columns
Phase 2: MIGRATE (copy data, switch reads)
Application reads from new columns
Backfill script copies old data to new columns
Verify data consistency
Phase 3: CONTRACT (remove old schema)
Application no longer references old column
DROP old column in separate migration
Implementation
-- Phase 1: Expand (Migration 1)
ALTER TABLE users ADD COLUMN first_name TEXT;
ALTER TABLE users ADD COLUMN last_name TEXT;
-- Phase 2: Backfill (Migration 2 - run in batches)
UPDATE users
SET first_name = split_part(user_name, ' ', 1),
last_name = split_part(user_name, ' ', 2)
WHERE first_name IS NULL
LIMIT 10000; -- Process in batches
-- Phase 3: Contract (Migration 3 - after app fully migrated)
ALTER TABLE users DROP COLUMN user_name;
ALTER TABLE users ALTER COLUMN first_name SET NOT NULL;
ALTER TABLE users ALTER COLUMN last_name SET NOT NULL;
Zero-Downtime Index Creation
-- WRONG: Locks table for entire duration
CREATE INDEX idx_orders_customer ON orders(customer_id);
-- RIGHT: Non-blocking (takes longer but no lock)
CREATE INDEX CONCURRENTLY idx_orders_customer ON orders(customer_id);
-- For large tables, monitor progress
SELECT phase, blocks_total, blocks_done,
round(100.0 * blocks_done / nullif(blocks_total, 0), 1) as pct
FROM pg_stat_progress_create_index;
Migration Testing
def test_migration_up_down():
"""Verify migration is reversible."""
# Apply migration
run_migration("up", version="20250315_add_first_last_name")
# Verify schema
columns = get_table_columns("users")
assert "first_name" in columns
assert "last_name" in columns
# Verify data preserved
user = query_one("SELECT * FROM users WHERE id = 1")
assert user.first_name is not None
# Roll back
run_migration("down", version="20250315_add_first_last_name")
# Verify rollback
columns = get_table_columns("users")
assert "first_name" not in columns
Anti-Patterns
| Anti-Pattern | Problem | Fix |
|---|---|---|
| Rename column directly | Breaks all application code referencing old name | Expand-contract pattern |
| Long-running migration in transaction | Locks table for minutes/hours | Batch updates, CREATE INDEX CONCURRENTLY |
| No rollback plan | Migration fails, stuck in broken state | Every migration has an “undo” script |
| Schema and data in one migration | Too much risk in one step | Separate DDL and data migrations |
| Testing in production only | Breaking changes discovered in production | Test against production-sized data in staging |
Checklist
- Migration reviewed for lock duration and table impact
- Expand-contract for breaking schema changes
- No table-locking DDL during business hours
-
CREATE INDEX CONCURRENTLYfor all index additions - Batch processing for large data backfills
- Rollback script tested for every migration
- Staging test with production-sized data before production
- Application compatibility with both old and new schema
:::note[Source] This guide is derived from operational intelligence at Garnet Grid Consulting. For database migration consulting, visit garnetgrid.com. :::