Reverse ETL Patterns | The Garnet Wiki

Reverse ETL takes data from your data warehouse and pushes it back into operational systems — CRMs, marketing platforms, customer support tools, and product applications. It closes the gap between analytical insights and operational action.

Why Reverse ETL

Traditional ETL:
  Operational Systems → Data Warehouse → Dashboards
  Result: Insights trapped in dashboards

Reverse ETL:
  Operational Systems → Data Warehouse → Operational Systems
  Result: Insights drive automated actions

Examples:
  - Warehouse identifies high-value customers → Sync to CRM
  - ML model scores churn risk → Push to customer success tool
  - Product analytics segments → Sync to marketing platform
  - Usage data → Billing system for usage-based pricing

Architecture

Data Warehouse (source of truth for analytics)
  ├── Customer 360 table
  ├── Product usage aggregates
  ├── ML model predictions
  └── Revenue analytics
     ↓
Reverse ETL Tool (Census, Hightouch, Polytoml)
  ├── Model: SQL query or dbt model defining the data
  ├── Mapping: Warehouse column → Destination field
  ├── Sync mode: Full refresh, incremental, mirror
  └── Schedule: Real-time, hourly, daily
     ↓
Destinations
  ├── Salesforce (lead scoring, customer data)
  ├── HubSpot (marketing segments, lifecycle stage)
  ├── Intercom (product usage, support context)
  ├── Stripe (billing metadata)
  └── Slack (alerts, notifications)

Sync Strategies

Incremental Sync

-- Only sync records changed since last sync
SELECT
    customer_id,
    name,
    email,
    lifetime_value,
    churn_score,
    product_tier,
    last_activity_at
FROM analytics.customer_360
WHERE updated_at > '{{ last_sync_timestamp }}'

Mirror Sync

-- Full state of the dataset, handles deletes
-- Reverse ETL tool compares against destination
-- Adds new records, updates changed, removes deleted
SELECT
    segment_id,
    customer_id,
    segment_name,
    entered_at
FROM analytics.customer_segments
WHERE segment_name = 'high_value_at_risk'

Data Activation Workflows

# Lead scoring workflow
model:
  name: "enriched_leads"
  query: |
    SELECT
      l.email,
      l.company,
      CASE
        WHEN u.page_views > 100 AND u.trial_days_remaining < 7 THEN 'hot'
        WHEN u.page_views > 50 THEN 'warm'
        ELSE 'cold'
      END as lead_score,
      u.features_used,
      u.last_active_at
    FROM warehouse.leads l
    JOIN warehouse.product_usage u ON l.user_id = u.user_id

destination: salesforce
mapping:
  email: Email
  company: Company
  lead_score: Lead_Score__c
  features_used: Features_Used__c
  last_active_at: Last_Product_Activity__c
  
schedule: every_hour
sync_mode: incremental

Anti-Patterns

Anti-Pattern	Consequence	Fix
Syncing raw data	Destination polluted with noise	Sync curated, purpose-built models
No deduplication	Duplicate records in CRM	Deduplicate in warehouse before sync
Real-time sync for batch use cases	Unnecessary complexity and cost	Match sync frequency to business need
No monitoring	Silent sync failures	Alert on sync errors, row count anomalies
Bi-directional sync without conflict resolution	Data ping-pong between systems	One system is source of truth per field

Reverse ETL turns the warehouse from a reporting tool into an operational engine. The insight is only valuable if it reaches the person or system that can act on it.

Why Reverse ETL

Architecture

Sync Strategies

Incremental Sync

Mirror Sync

Data Activation Workflows

Anti-Patterns

More in Data Engineering

CDC Pipeline Architecture

Change Data Capture (CDC) Patterns

Batch Processing at Scale