Verified by Garnet Grid

How to Manage Database Schema Migrations Safely

Deploy database changes without downtime. Covers blue-green migrations, backward compatibility, rollback strategies, and migration testing.

Schema migrations are the scariest deploys. A bad migration can take down production, corrupt data, or lock tables for hours. Application deployments can be rolled back in seconds — drop a bad schema change and you’re looking at a multi-hour incident, potential data loss, and a post-mortem that nobody wants to write.

This guide covers how to make schema migrations boring and safe: the golden rules, the expand-and-contract pattern, migration testing, lock-free operations, and the rollback strategies that keep production running.


The Golden Rules

  1. Every migration must be backward compatible — old code must work with the new schema, because during deployment both versions run simultaneously
  2. Never rename or drop columns in one step — use the expand-and-contract pattern (three deploys, not one)
  3. Always test migrations against a production-size dataset — a migration that takes 200ms on your dev laptop can lock tables for 45 minutes on 50 million rows
  4. Have a rollback plan before you start — write the rollback migration and test it before deploying forward
  5. Never mix schema changes and data changes in one migration — separate DDL (structure) from DML (data) for cleaner rollbacks

Step 1: Expand and Contract Pattern

The expand-and-contract pattern is the safest way to make schema changes. It separates the change into three deploys, ensuring backward compatibility at every step.

Adding a Column (Safe — Single Deploy)

-- Step 1: Add column as nullable (instant operation in PostgreSQL 11+)
ALTER TABLE users ADD COLUMN display_name VARCHAR(255);

-- Step 2: Backfill data in batches (not a single UPDATE on millions of rows)
UPDATE users SET display_name = first_name || ' ' || last_name
WHERE display_name IS NULL
  AND id BETWEEN 1 AND 100000;
-- Repeat for next batches...

-- Step 3: Once backfill complete and app writes to new column, add constraint
ALTER TABLE users ALTER COLUMN display_name SET NOT NULL;
ALTER TABLE users ALTER COLUMN display_name SET DEFAULT '';

Renaming a Column (Dangerous — Requires Three Deploys)

-- ❌ WRONG: This breaks all existing code INSTANTLY
ALTER TABLE users RENAME COLUMN name TO full_name;

-- ✅ RIGHT: Expand-and-contract over 3 deploys

-- Deploy 1 (Schema): Add new column, start dual-write
ALTER TABLE users ADD COLUMN full_name VARCHAR(255);
UPDATE users SET full_name = name;  -- Backfill
-- App code: Write to BOTH 'name' and 'full_name', read from 'full_name'
-- Create trigger to keep columns in sync during transition:
CREATE OR REPLACE FUNCTION sync_name_columns() RETURNS TRIGGER AS $$
BEGIN
    IF NEW.name IS DISTINCT FROM OLD.name THEN
        NEW.full_name := NEW.name;
    END IF;
    IF NEW.full_name IS DISTINCT FROM OLD.full_name THEN
        NEW.name := NEW.full_name;
    END IF;
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

-- Deploy 2 (Code): Stop reading 'name', read/write only 'full_name'
-- Remove trigger, stop writing to 'name'

-- Deploy 3 (Cleanup): Drop old column after verification period (1-2 weeks)
ALTER TABLE users DROP COLUMN name;

Dropping a Table or Column (Requires Soft-Delete First)

StepDeployWhat Happens
1. Stop writingCode deployApp no longer writes to column/table
2. Stop readingCode deployApp no longer reads from column/table
3. Wait1-2 weeksVerify nothing breaks, check logs for access
4. DropMigration deployRemove the column/table

Step 2: Migration Framework

Alembic (Python / SQLAlchemy)

# alembic/versions/001_add_display_name.py
"""Add display_name column"""
revision = '001'
down_revision = None

from alembic import op
import sqlalchemy as sa

def upgrade():
    op.add_column('users',
        sa.Column('display_name', sa.String(255), nullable=True)
    )

def downgrade():
    op.drop_column('users', 'display_name')
# Migration lifecycle commands
alembic upgrade head          # Apply all pending migrations
alembic downgrade -1          # Rollback last migration
alembic history               # Show migration history
alembic current               # Show current version
alembic revision --autogenerate -m "add display_name"  # Generate from model diff

Other Migration Frameworks

FrameworkLanguageKey Features
AlembicPythonAuto-generate from SQLAlchemy models, branching
FlywayJava/JVMSQL-based, versioned, no ORM dependency
LiquibaseJava/JVMXML/YAML/SQL, rollback support, diff
Knex.jsNode.jsJavaScript migrations, Promise-based
golang-migrateGoCLI + library, database-agnostic
Prisma MigrateTypeScriptSchema-first, shadow database for diffing
ActiveRecordRubyMature, reversible migrations, Rails native

Step 3: Testing Migrations

Never deploy a migration without testing it against production-size data. What takes milliseconds on 1,000 rows can take hours on 100 million.

# Estimate migration time on production-size data
import time
import psycopg2

def test_migration_timing(conn, migration_sql, table_name):
    """Run migration on a sample and extrapolate to full dataset"""
    cur = conn.cursor()

    # Get row count
    cur.execute(f"SELECT COUNT(*) FROM {table_name}")
    row_count = cur.fetchone()[0]

    # Time the migration on a 100K sample
    sample_size = min(100000, row_count)
    start = time.time()
    cur.execute(f"""
        CREATE TEMP TABLE test_migration AS
        SELECT * FROM {table_name} LIMIT {sample_size}
    """)
    cur.execute(migration_sql.replace(table_name, "test_migration"))
    elapsed = time.time() - start

    # Extrapolate
    estimated_total = (elapsed / sample_size) * row_count
    print(f"Rows: {row_count:,}")
    print(f"Sample time: {elapsed:.2f}s ({sample_size:,} rows)")
    print(f"Estimated full: {estimated_total:.0f}s ({estimated_total/60:.1f} min)")

    if estimated_total > 300:  # > 5 minutes
        print("⚠️  WARNING: Consider batched migration or online DDL tool")
    if estimated_total > 3600:  # > 1 hour
        print("🚨 CRITICAL: Must use background migration with batching")

    conn.rollback()  # Don't actually apply

What to Test

TestWhyHow
Timing on production-size dataCatch 45-minute table locksSample + extrapolate
Backward compatibilityOld code must workRun old test suite against new schema
Forward compatibilityRollback must workApply, then rollback, verify data integrity
Lock contentionExclusive locks block all queriesMonitor pg_stat_activity during test
Data integrityBackfills must be correctCOUNT, SUM, spot-check after migration

Step 4: Lock-Free Migrations

Large tables require special techniques to avoid blocking production queries.

-- PostgreSQL: CREATE INDEX CONCURRENTLY (no table lock)
-- Regular CREATE INDEX locks the table for writes!
CREATE INDEX CONCURRENTLY idx_orders_customer
ON orders (customer_id);
-- Takes longer, but allows reads AND writes during creation

-- Add column with default (PostgreSQL 11+ is INSTANT — no rewrite)
ALTER TABLE orders ADD COLUMN priority INTEGER DEFAULT 0;
-- Pre-PG11 this would rewrite the entire table!

-- Batch updates instead of single UPDATE on millions of rows
DO $$
DECLARE
    batch_size INT := 10000;
    rows_updated INT;
    total_updated INT := 0;
BEGIN
    LOOP
        UPDATE users
        SET display_name = first_name || ' ' || last_name
        WHERE display_name IS NULL
          AND id IN (
            SELECT id FROM users
            WHERE display_name IS NULL
            LIMIT batch_size
            FOR UPDATE SKIP LOCKED  -- Skip rows being modified by others
          );
        GET DIAGNOSTICS rows_updated = ROW_COUNT;
        total_updated := total_updated + rows_updated;
        EXIT WHEN rows_updated = 0;
        PERFORM pg_sleep(0.1);  -- Brief pause to reduce CPU/IO pressure
        RAISE NOTICE 'Updated % rows (total: %)', rows_updated, total_updated;
    END LOOP;
END $$;

Operations That Lock Tables

OperationLock TypeDurationSafe Alternative
CREATE INDEXShareLock (blocks writes)Minutes-hoursCREATE INDEX CONCURRENTLY
ALTER TABLE ADD COLUMN (with default, pre-PG11)AccessExclusiveLockMinutes-hoursUpgrade to PG11+ or add nullable first
ALTER TABLE ADD NOT NULLAccessExclusiveLockFast if column existsAdd CHECK constraint first, then validate
ALTER TABLE SET DATA TYPEAccessExclusiveLockRewrites tableAdd new column, dual-write, drop old
UPDATE on full tableRowExclusiveLockHours on large tablesBatch updates with LIMIT + loop

Step 5: Online DDL Tools for MySQL

MySQL has more table-locking operations than PostgreSQL. Use online DDL tools for large tables:

# pt-online-schema-change (Percona Toolkit)
pt-online-schema-change \
  --alter "ADD COLUMN display_name VARCHAR(255)" \
  --execute \
  D=mydb,t=users

# gh-ost (GitHub's Online Schema Migration)
gh-ost \
  --alter "ADD COLUMN display_name VARCHAR(255)" \
  --database=mydb \
  --table=users \
  --execute

Migration Safety Checklist

  • Migration is backward compatible with currently deployed code
  • Tested on production-size dataset (timing verified, < 5 min goal)
  • Rollback migration written AND tested before deploying forward
  • No table-level locks on tables > 100K rows
  • CREATE INDEX CONCURRENTLY used (not CREATE INDEX)
  • Batch updates used for large backfills (not single UPDATE)
  • Schema and data changes in separate migrations
  • Migration reviewed by DBA or senior engineer
  • Deployed during low-traffic window (if lock-free isn’t possible)
  • Monitoring in place during and after migration (query latency, lock waits)
  • Verification queries confirm data integrity post-migration

:::note[Source] This guide is derived from operational intelligence at Garnet Grid Consulting. For database consulting, visit garnetgrid.com. :::

Jakub Dimitri Rezayev
Jakub Dimitri Rezayev
Founder & Chief Architect • Garnet Grid Consulting

Jakub holds an M.S. in Customer Intelligence & Analytics and a B.S. in Finance & Computer Science from Pace University. With deep expertise spanning D365 F&O, Azure, Power BI, and AI/ML systems, he architects enterprise solutions that bridge legacy systems and modern technology — and has led multi-million dollar ERP implementations for Fortune 500 supply chains.

View Full Profile →