ESC
Type to search guides, tutorials, and reference documentation.
Verified by Garnet Grid

Database Migration Strategies: Changing Schemas Without Breaking Production

Migrate database schemas safely in production without downtime, data loss, or broken deployments. Covers expand-and-contract, backward-compatible migrations, zero-downtime techniques, rollback strategies, data migration patterns, and the deployment workflow that lets you change your schema while traffic is flowing.

Database migrations are the most dangerous part of a deployment. Code changes can be rolled back in seconds. Schema changes can take hours to undo, and some — like dropping a column that had data — cannot be undone at all. The challenge is that your database schema needs to evolve as your application evolves, but you cannot take your application offline every time you need to add a column.

The solution is a set of patterns that make every schema change backward-compatible, so the old code and new code both work with the database at every step of the migration.


The Fundamental Problem

The dangerous moment:

  v1 code           Database                v2 code
  ─────────────────────────────────────────────────
  SELECT name,      users table             SELECT name,
    email             id                      email,
  FROM users          name                    phone    ← NEW
                      email               FROM users
                                          
  If you deploy v2 code BEFORE adding the phone column → errors
  If you add phone column BEFORE deploying v2 code → safe (column exists but unused)
  If you drop email column BEFORE deploying v1 rollback code → disaster

Expand-and-Contract Pattern

The core technique for zero-downtime migrations:

Phase 1: EXPAND (add new structure)
  ┌──────────────────────────────────────┐
  │  Add new column (nullable)           │
  │  Add new table                       │
  │  Add new index (concurrently)        │
  │                                      │
  │  Old code: still works (ignores new) │
  │  New code: can use new structure     │
  └──────────────────────────────────────┘


Phase 2: MIGRATE (move data)
  ┌──────────────────────────────────────┐
  │  Backfill new columns               │
  │  Dual-write to old and new          │
  │  Verify data consistency            │
  └──────────────────────────────────────┘


Phase 3: CONTRACT (remove old structure)
  ┌──────────────────────────────────────┐
  │  Remove old code paths              │
  │  Drop old columns/tables            │
  │  Remove dual-write logic            │
  │                                      │
  │  Only after all code uses new path  │
  └──────────────────────────────────────┘

Common Migration Patterns

Adding a Column

-- ✅ Safe: add nullable column (no lock, no rewrite)
ALTER TABLE users ADD COLUMN phone VARCHAR(20);

-- ❌ Unsafe: add column with DEFAULT (rewrites entire table in old PostgreSQL)
-- In PostgreSQL 11+, this is safe for non-volatile defaults
ALTER TABLE users ADD COLUMN phone VARCHAR(20) DEFAULT '';

-- ❌ Unsafe: add NOT NULL column without default
ALTER TABLE users ADD COLUMN phone VARCHAR(20) NOT NULL;
-- Fails if table has existing rows

Renaming a Column

-- Never rename directly. Use expand-and-contract:

-- Step 1: Add new column
ALTER TABLE users ADD COLUMN full_name VARCHAR(255);

-- Step 2: Backfill data
UPDATE users SET full_name = name WHERE full_name IS NULL;
-- (do in batches for large tables)

-- Step 3: Deploy code using full_name (dual-read from both)
-- Step 4: Stop writing to old column
-- Step 5: Drop old column (weeks later, after verification)
ALTER TABLE users DROP COLUMN name;

Adding an Index

-- ❌ Unsafe: locks table for duration of index build
CREATE INDEX idx_users_email ON users (email);

-- ✅ Safe: build without locking
CREATE INDEX CONCURRENTLY idx_users_email ON users (email);
-- Takes longer but does not block reads or writes
-- Cannot run inside a transaction

Data Migration Best Practices

# Batch processing for large tables
def backfill_phone_numbers(batch_size=1000):
    """Backfill new phone column from legacy contact_info JSON."""
    
    last_id = 0
    total_updated = 0
    
    while True:
        # Process in batches to avoid long-running transactions
        result = db.execute("""
            UPDATE users 
            SET phone = contact_info->>'phone'
            WHERE id > %s 
              AND id <= (SELECT MIN(id) + %s FROM users WHERE id > %s)
              AND phone IS NULL
              AND contact_info->>'phone' IS NOT NULL
            RETURNING id
        """, (last_id, batch_size, last_id))
        
        updated = result.rowcount
        total_updated += updated
        
        if updated == 0:
            break
            
        last_id += batch_size
        
        # Log progress
        logger.info(f"Backfilled {total_updated} rows so far")
        
        # Small sleep to reduce database load
        time.sleep(0.1)
    
    logger.info(f"Backfill complete: {total_updated} rows updated")

Migration Safety Checklist

CheckWhy
Migration is backward-compatibleOld code must still work after migration runs
Large table operations are batchedAvoid locking the table for minutes/hours
Indexes created CONCURRENTLYPrevent table locks during index creation
Migration has been tested against production-size data1M rows behaves differently than 100
Rollback plan existsWhat happens if this migration fails halfway?
Monitoring in placeWatch for lock waits, slow queries after migration

Migration Tools

ToolLanguageKey Feature
AlembicPython (SQLAlchemy)Auto-generation from models
FlywayJava / AnySQL-based, versioned migrations
LiquibaseJava / AnyXML/YAML/SQL, database-agnostic
golang-migrateGoSimple CLI, SQL files
Prisma MigrateTypeScript/NodeSchema-first, preview changes
Django MigrationsPython (Django)Auto-detection from models

Anti-Patterns

Anti-PatternProblemFix
Migration and code in same deployIf migration fails, code is also brokenDeploy migration first, code second
Manual schema changesUntracked, unreproducibleAll changes through migration files
Irreversible migrations without backupCannot undo if something goes wrongAlways test rollback, backup before destructive changes
One big migrationHours to run, hard to debug failuresBreak into small, incremental steps
NOT NULL without defaultFails on existing dataAdd nullable first, backfill, then add constraint

Implementation Checklist

  • Use expand-and-contract for all schema changes in production
  • Deploy migrations separately from application code
  • Add columns as nullable first, backfill data, then add constraints
  • Create indexes with CONCURRENTLY to avoid table locks
  • Batch large data migrations (1000-10000 rows at a time)
  • Test migrations against production-size datasets before deploying
  • Maintain a rollback script for every migration
  • Never rename or drop columns in the same deploy as the code change
  • Monitor database lock waits and query performance during migrations
  • Version-control all migration files alongside application code
Jakub Dimitri Rezayev
Jakub Dimitri Rezayev
Founder & Chief Architect • Garnet Grid Consulting

Jakub holds an M.S. in Customer Intelligence & Analytics and a B.S. in Finance & Computer Science from Pace University. With deep expertise spanning D365 F&O, Azure, Power BI, and AI/ML systems, he architects enterprise solutions that bridge legacy systems and modern technology — and has led multi-million dollar ERP implementations for Fortune 500 supply chains.

View Full Profile →