ETL vs ELT: Modern Data Integration

ETL (Extract, Transform, Load) transforms data before loading it into the target. ELT (Extract, Load, Transform) loads raw data first, then transforms it inside the target system. The shift from ETL to ELT happened because cloud data warehouses became powerful enough to run the transformations that used to require dedicated ETL servers.

ETL vs ELT

Aspect	ETL	ELT
Transform where?	Separate transform server	Inside the warehouse
Raw data preserved?	No (transformed before loading)	Yes (raw data in bronze layer)
Schema flexibility	Schema-on-write (define first)	Schema-on-read (define later)
Scalability	Limited by transform server	Scales with warehouse compute
Cost model	Transform server + warehouse	Warehouse compute (pay per query)
Best for	Regulated environments, small data	Modern analytics, large data, ML
Tools	Informatica, Talend, SSIS	Fivetran + dbt, Airbyte + dbt

Modern ELT Stack

Sources                    Extract & Load           Transform
┌──────────┐              ┌──────────────┐        ┌──────────┐
│ SaaS APIs ├──┐          │ Fivetran     │        │ dbt      │
│ Databases ├──┼─────────▶│ Airbyte      ├───────▶│          │
│ Files     ├──┘          │ Stitch       │  raw   │ SQL-based│
│ Events    │              └──────────────┘  data   │ transform│
└──────────┘                                       └────┬─────┘
                                                        │
                                              ┌─────────▼──────┐
                                              │ Data Warehouse   │
                                              │ (Snowflake/BQ/   │
                                              │  Redshift)       │
                                              └──────────────────┘

dbt Transformation Layer

-- models/staging/stg_orders.sql
WITH source AS (
    SELECT * FROM {{ source('commerce', 'orders') }}
),

renamed AS (
    SELECT
        id AS order_id,
        user_id AS customer_id,
        total_amount_cents / 100.0 AS amount,
        status,
        CAST(created_at AS TIMESTAMP) AS order_date,
        CAST(_fivetran_synced AS TIMESTAMP) AS loaded_at
    FROM source
    WHERE _fivetran_deleted = FALSE
)

SELECT * FROM renamed

-- models/marts/daily_revenue.sql
WITH orders AS (
    SELECT * FROM {{ ref('stg_orders') }}
    WHERE status NOT IN ('cancelled', 'refunded')
),

daily AS (
    SELECT
        DATE_TRUNC('day', order_date) AS date,
        COUNT(*) AS order_count,
        SUM(amount) AS revenue,
        AVG(amount) AS avg_order_value,
        COUNT(DISTINCT customer_id) AS unique_customers
    FROM orders
    GROUP BY 1
)

SELECT * FROM daily

Incremental Processing

Strategy	Description	Best For
Full refresh	Drop and rebuild entire table	Small tables, infrequent changes
Incremental append	Only process new rows	Event logs, time-series
Incremental merge	Upsert changed rows	Slowly changing dimensions
Snapshot	Track history of all changes	Auditing, SCD Type 2

-- dbt incremental model
{{ config(
    materialized='incremental',
    unique_key='order_id',
    incremental_strategy='merge',
    on_schema_change='append_new_columns'
) }}

SELECT
    order_id,
    customer_id,
    amount,
    status,
    updated_at
FROM {{ ref('stg_orders') }}

{% if is_incremental() %}
    WHERE updated_at > (SELECT MAX(updated_at) FROM {{ this }})
{% endif %}

Tool Comparison

Tool	Type	Best For	Pricing
Fivetran	Managed EL	Teams wanting zero-maintenance connectors	Per-row
Airbyte	Open-source EL	Budget-conscious, custom connectors	Free (self-hosted)
dbt	Transform (T)	SQL-based transformations	Free (Core) / Paid (Cloud)
Informatica	Enterprise ETL	Complex transformations, governance	Enterprise license
Azure Data Factory	Cloud ETL/ELT	Azure-heavy environments	Per-activity
AWS Glue	Serverless ETL	AWS-native, Spark-based	Per-DPU-hour

Anti-Patterns

Anti-Pattern	Problem	Fix
Transform in extraction tool	Business logic scattered across tools	Use dbt for all transformation logic
Full refresh everything	Expensive, slow for large tables	Incremental processing for tables > 1M rows
No staging layer	Business logic applied directly to raw data	Stage raw → clean → transform (medallion)
Hardcoded connections	Environment-specific configs in code	Environment variables, config management
No lineage	Can’t trace data from dashboard to source	dbt docs + data catalog integration

Checklist

ELT or ETL pattern selected based on data volume and warehouse capability
Extract tools configured (Fivetran, Airbyte) with appropriate sync frequency
Staging layer: raw data preserved before transformation
dbt project: organized models with staging → intermediate → marts layers
Incremental processing for large tables (> 1M rows)
Testing: schema tests, data tests, freshness tests in dbt
Documentation: model descriptions, column descriptions
Monitoring: sync failures, transformation errors, freshness SLAs
Version control: all transformation logic in Git

:::note[Source] This guide is derived from operational intelligence at Garnet Grid Consulting. For data integration consulting, visit garnetgrid.com. :::

ETL vs ELT

Modern ELT Stack

dbt Transformation Layer

Incremental Processing

Tool Comparison

Anti-Patterns

Checklist

More in Data Engineering

CDC Pipeline Architecture

Change Data Capture (CDC) Patterns

Batch Processing at Scale