80% of ERP project delays are caused by data migration. Not because the technical migration is hard, but because nobody profiled the source data, nobody documented the mapping rules, and nobody validated the results until go-live week. This playbook prevents that by giving you an explicit phase-gate process with validation criteria at every step.
The golden rule of ERP migration: data cleansing happens in the source system, not during migration. If you try to fix data quality issues in your ETL scripts, they’ll grow in complexity until they become unmaintainable. Clean the source first, then migrate clean data.
Migration Phases
Phase 1: Data Profiling → Understand what you have
Phase 2: Field Mapping → Define source → target transformations
Phase 3: Data Cleansing → Fix quality issues before migration
Phase 4: Migration Development → Build ETL scripts/tools
Phase 5: Mock Migration → Run full migration in test environment
Phase 6: Validation → Verify accuracy and completeness
Phase 7: Cutover → Execute production migration
Phase Timeline
| Phase | Duration | Key Deliverable | Gate Criteria |
|---|
| Profiling | 2-3 weeks | Data quality report | Business sign-off on scope |
| Mapping | 3-4 weeks | Signed mapping document | All fields mapped or excluded |
| Cleansing | 3-6 weeks | Clean source data | Quality KPIs met |
| Development | 4-6 weeks | ETL scripts + unit tests | All mappings implemented |
| Mock Migration | 2-3 weeks (×3 mocks) | Reconciliation report | Row counts + financials match |
| Validation | 1-2 weeks | Business sign-off | SME spot-checks pass |
| Cutover | 1 weekend | Production system live | Post-live validation passes |
Phase 1: Data Profiling
Before mapping anything, understand your source data.
-- Profile a source table
SELECT
'customers' AS table_name,
COUNT(*) AS total_rows,
COUNT(DISTINCT customer_id) AS unique_ids,
SUM(CASE WHEN email IS NULL THEN 1 ELSE 0 END) AS null_emails,
SUM(CASE WHEN phone IS NULL THEN 1 ELSE 0 END) AS null_phones,
MIN(created_date) AS earliest_record,
MAX(created_date) AS latest_record,
SUM(CASE WHEN status = 'inactive' THEN 1 ELSE 0 END) AS inactive_count
FROM source_customers;
Profiling Checklist
| Check | What to Look For | Red Flag |
|---|
| Row count | Baseline for validation | Millions more than expected |
| NULL rates per column | Mandatory fields that are empty | > 50% NULL on required fields |
| Duplicate keys | IDs that aren’t actually unique | Any duplicates on PK |
| Date ranges | Historical data you may not need | Records from 1900 or future dates |
| Orphan records | Child records without parent | > 1% orphan rate |
| Encoding issues | Unicode, special characters | Mojibake (garbled text) |
| Status codes | What values exist vs what target accepts | Unmapped status values |
| Data volume per entity | How many records per category | One category with 90% of data |
Entity Prioritization
| Entity | Dependencies | Migration Order | Risk Level |
|---|
| Chart of accounts | None (foundation) | 1st | Low |
| Customers | None | 2nd | Medium |
| Vendors/Suppliers | None | 3rd | Medium |
| Products/Items | Categories, UoM | 4th | Medium |
| Open purchase orders | Vendors, Products | 5th | High |
| Open sales orders | Customers, Products | 6th | High |
| Open invoices (AR/AP) | Customers, Vendors | 7th | Critical |
| General ledger balances | Chart of accounts | 8th | Critical |
| Transaction history | All of the above | Last (if at all) | High |
Phase 2: Field Mapping
Mapping Document Template
| Source Table | Source Field | Target Table | Target Field | Transform | Validation | Notes |
|---|
| customers | cust_id | Customer | CustomerAccount | Direct map | NOT NULL, unique | PK |
| customers | cust_name | Customer | Name | TRIM + PROPER CASE | Max 100 chars | |
| customers | email | Customer | PrimaryEmail | LOWER + TRIM | Validate format | |
| customers | phone | Customer | Phone | Format to E.164 | Strip non-digits first | |
| customers | status | Customer | Status | CASE: A→Active, I→Inactive | Valid values only | Default: Active |
| customers | credit_limit | Customer | CreditLimit | Convert to target currency | NULL → 0.00, ≥ 0 | |
| orders | order_date | SalesOrder | OrderDate | Convert timezone to UTC | Not in future | |
| orders | total | SalesOrder | TotalAmount | 2 decimal places | Validate > 0 | |
# Common transformation functions
def transform_customer_name(name: str) -> str:
"""Standardize customer names."""
if not name:
return "UNNAMED CUSTOMER"
return name.strip().title()[:100] # Trim, proper case, max length
def transform_phone(phone: str) -> str:
"""Convert to E.164 format."""
digits = re.sub(r'\D', '', phone)
if len(digits) == 10:
return f"+1{digits}" # US default
elif len(digits) == 11 and digits[0] == '1':
return f"+{digits}"
return f"+{digits}"
def map_status(source_status: str) -> str:
"""Map source status codes to target."""
mapping = {
'A': 'Active', 'ACTIVE': 'Active',
'I': 'Inactive', 'INACTIVE': 'Inactive',
'S': 'Suspended', 'D': 'Deleted',
}
return mapping.get(source_status.upper(), 'Active')
Handling Unmapped Data
| Scenario | Strategy | Example |
|---|
| Source field has no target | Park in _legacy table | misc_field_3 → legacy reference table |
| Target field has no source | Set default value | CurrencyCode → ‘USD’ |
| Values don’t match target enum | Map or reject | Source ‘X’ status has no target equivalent |
| Source has multiple values, target has one | Merge with priority rules | 3 phone fields → pick most recent |
Phase 3: Data Cleansing
Common Cleansing Tasks
| Issue | Frequency | Fix Strategy | Where to Fix |
|---|
| Duplicate customer records | 5-15% | Deduplicate merge rules | Source system |
| Invalid email formats | 10-20% | Regex validation + manual review | Source system |
| Missing mandatory fields | Varies | Default values or manual entry | Source system |
| Inconsistent addresses | 20-40% | Address standardization service | ETL or source |
| Orphaned transactions | 2-5% | Re-link or exclude | Source system |
Phase 5: Mock Migration
Run at least 3 full mock migrations before cutover.
| Mock | Purpose | Timing | Success Criteria |
|---|
| Mock 1 | Test ETL scripts, identify mapping errors | 8 weeks before go-live | Scripts execute without errors |
| Mock 2 | Fix issues from Mock 1, validate with business | 4 weeks before go-live | Row counts within 0.5% |
| Mock 3 | Dress rehearsal — full cutover simulation | 2 weeks before go-live | Row counts exact, financials to the penny |
Mock Migration Runbook
Pre-Mock:
□ Refresh test environment to clean state
□ Take snapshot of source data (point-in-time)
□ Notify stakeholders of mock schedule
Execute:
□ Run ETL scripts in sequence (master data → transactions)
□ Record execution time for each entity
□ Log all errors and warnings to tracking spreadsheet
Validate:
□ Row count reconciliation (source vs target)
□ Financial reconciliation (AR, AP, GL balances)
□ Business spot-checks (50 random records per entity)
□ Integration tests (can users create orders against migrated data?)
Post-Mock:
□ Document all issues found
□ Estimate remediation effort
□ Update migration runbook with lessons learned
Phase 6: Validation
Reconciliation Report
-- Row count reconciliation
SELECT
'Customers' AS entity,
(SELECT COUNT(*) FROM source_customers WHERE status != 'D') AS source_count,
(SELECT COUNT(*) FROM target.customers) AS target_count,
(SELECT COUNT(*) FROM source_customers WHERE status != 'D') -
(SELECT COUNT(*) FROM target.customers) AS difference;
-- Financial reconciliation
SELECT
'Open AR Balance' AS metric,
(SELECT SUM(balance) FROM source_ar WHERE status = 'open') AS source_total,
(SELECT SUM(amount) FROM target.ar_transactions WHERE open = 1) AS target_total;
Validation Gates
| Gate | Criteria | Action if Failed |
|---|
| Row count | Source − target < 0.1% | Investigate, re-migrate |
| Financial totals | Difference < $0.01 | Trace every cent |
| Referential integrity | 0 orphan records | Fix parent records first |
| Business spot-check | 50 random records validated by SME | Fix mapping rules |
| Integration test | Users can create transactions on migrated data | Fix configuration |
Phase 7: Cutover Planning
Cutover Timeline (Example Weekend)
| Time | Activity | Owner | Rollback Point |
|---|
| Friday 6 PM | Freeze source system (read-only) | IT Operations | N/A |
| Friday 7 PM | Final data extract from source | Migration Team | ✅ (restore read-write) |
| Friday 9 PM | Run production migration scripts | Migration Team | ✅ (discard and retry) |
| Saturday 6 AM | Automated validation suite runs | Migration Team | ✅ (restore source) |
| Saturday 9 AM | Business validation (SME spot-checks) | Business Team | ✅ (restore source) |
| Saturday 3 PM | Go/No-Go decision | Steering Committee | ✅ (last rollback point) |
| Sunday | Training day, resolve issues | All | No rollback after this |
| Monday 8 AM | Production go-live | All | Incident response mode |
Common Pitfalls
| Pitfall | Impact | Prevention |
|---|
| Not profiling source data | Discover bad data at go-live | Profile in Phase 1, no exceptions |
| Migrating ALL history | 10x migration time | Define cutoff (2-3 years typical) |
| No mock migrations | Untested scripts in production | Minimum 3 mocks |
| Skipping financial reconciliation | Balance discrepancies post go-live | Reconcile to the penny |
| One person owns migration | Single point of failure | Cross-train, document everything |
| Data cleansing during migration | ETL complexity explodes | Clean in source system first |
| No rollback plan | Stuck with bad migration | Document rollback at every step |
Checklist
:::note[Source]
This guide is derived from operational intelligence at Garnet Grid Consulting. For ERP migration consulting, visit garnetgrid.com.
:::