Database Migration Strategies: Changing Schemas Without Breaking Production

Database migrations are the most dangerous part of a deployment. Code changes can be rolled back in seconds. Schema changes can take hours to undo, and some — like dropping a column that had data — cannot be undone at all. The challenge is that your database schema needs to evolve as your application evolves, but you cannot take your application offline every time you need to add a column.

The solution is a set of patterns that make every schema change backward-compatible, so the old code and new code both work with the database at every step of the migration.

The Fundamental Problem

The dangerous moment:

  v1 code           Database                v2 code
  ─────────────────────────────────────────────────
  SELECT name,      users table             SELECT name,
    email             id                      email,
  FROM users          name                    phone    ← NEW
                      email               FROM users
                                          
  If you deploy v2 code BEFORE adding the phone column → errors
  If you add phone column BEFORE deploying v2 code → safe (column exists but unused)
  If you drop email column BEFORE deploying v1 rollback code → disaster

Expand-and-Contract Pattern

The core technique for zero-downtime migrations:

Phase 1: EXPAND (add new structure)
  ┌──────────────────────────────────────┐
  │  Add new column (nullable)           │
  │  Add new table                       │
  │  Add new index (concurrently)        │
  │                                      │
  │  Old code: still works (ignores new) │
  │  New code: can use new structure     │
  └──────────────────────────────────────┘
                    │
                    ▼
Phase 2: MIGRATE (move data)
  ┌──────────────────────────────────────┐
  │  Backfill new columns               │
  │  Dual-write to old and new          │
  │  Verify data consistency            │
  └──────────────────────────────────────┘
                    │
                    ▼
Phase 3: CONTRACT (remove old structure)
  ┌──────────────────────────────────────┐
  │  Remove old code paths              │
  │  Drop old columns/tables            │
  │  Remove dual-write logic            │
  │                                      │
  │  Only after all code uses new path  │
  └──────────────────────────────────────┘

Common Migration Patterns

Adding a Column

-- ✅ Safe: add nullable column (no lock, no rewrite)
ALTER TABLE users ADD COLUMN phone VARCHAR(20);

-- ❌ Unsafe: add column with DEFAULT (rewrites entire table in old PostgreSQL)
-- In PostgreSQL 11+, this is safe for non-volatile defaults
ALTER TABLE users ADD COLUMN phone VARCHAR(20) DEFAULT '';

-- ❌ Unsafe: add NOT NULL column without default
ALTER TABLE users ADD COLUMN phone VARCHAR(20) NOT NULL;
-- Fails if table has existing rows

Renaming a Column

-- Never rename directly. Use expand-and-contract:

-- Step 1: Add new column
ALTER TABLE users ADD COLUMN full_name VARCHAR(255);

-- Step 2: Backfill data
UPDATE users SET full_name = name WHERE full_name IS NULL;
-- (do in batches for large tables)

-- Step 3: Deploy code using full_name (dual-read from both)
-- Step 4: Stop writing to old column
-- Step 5: Drop old column (weeks later, after verification)
ALTER TABLE users DROP COLUMN name;

Adding an Index

-- ❌ Unsafe: locks table for duration of index build
CREATE INDEX idx_users_email ON users (email);

-- ✅ Safe: build without locking
CREATE INDEX CONCURRENTLY idx_users_email ON users (email);
-- Takes longer but does not block reads or writes
-- Cannot run inside a transaction

Data Migration Best Practices

# Batch processing for large tables
def backfill_phone_numbers(batch_size=1000):
    """Backfill new phone column from legacy contact_info JSON."""
    
    last_id = 0
    total_updated = 0
    
    while True:
        # Process in batches to avoid long-running transactions
        result = db.execute("""
            UPDATE users 
            SET phone = contact_info->>'phone'
            WHERE id > %s 
              AND id <= (SELECT MIN(id) + %s FROM users WHERE id > %s)
              AND phone IS NULL
              AND contact_info->>'phone' IS NOT NULL
            RETURNING id
        """, (last_id, batch_size, last_id))
        
        updated = result.rowcount
        total_updated += updated
        
        if updated == 0:
            break
            
        last_id += batch_size
        
        # Log progress
        logger.info(f"Backfilled {total_updated} rows so far")
        
        # Small sleep to reduce database load
        time.sleep(0.1)
    
    logger.info(f"Backfill complete: {total_updated} rows updated")

Migration Safety Checklist

Check	Why
Migration is backward-compatible	Old code must still work after migration runs
Large table operations are batched	Avoid locking the table for minutes/hours
Indexes created CONCURRENTLY	Prevent table locks during index creation
Migration has been tested against production-size data	1M rows behaves differently than 100
Rollback plan exists	What happens if this migration fails halfway?
Monitoring in place	Watch for lock waits, slow queries after migration

Migration Tools

Tool	Language	Key Feature
Alembic	Python (SQLAlchemy)	Auto-generation from models
Flyway	Java / Any	SQL-based, versioned migrations
Liquibase	Java / Any	XML/YAML/SQL, database-agnostic
golang-migrate	Go	Simple CLI, SQL files
Prisma Migrate	TypeScript/Node	Schema-first, preview changes
Django Migrations	Python (Django)	Auto-detection from models

Anti-Patterns

Anti-Pattern	Problem	Fix
Migration and code in same deploy	If migration fails, code is also broken	Deploy migration first, code second
Manual schema changes	Untracked, unreproducible	All changes through migration files
Irreversible migrations without backup	Cannot undo if something goes wrong	Always test rollback, backup before destructive changes
One big migration	Hours to run, hard to debug failures	Break into small, incremental steps
NOT NULL without default	Fails on existing data	Add nullable first, backfill, then add constraint