Verified by Garnet Grid

ETL vs ELT: Modern Data Integration

Choose between ETL and ELT patterns. Covers transformation strategy, tool comparison, data loading patterns, incremental processing, and building scalable integration pipelines.

ETL (Extract, Transform, Load) transforms data before loading it into the target. ELT (Extract, Load, Transform) loads raw data first, then transforms it inside the target system. The shift from ETL to ELT happened because cloud data warehouses became powerful enough to run the transformations that used to require dedicated ETL servers.


ETL vs ELT

AspectETLELT
Transform where?Separate transform serverInside the warehouse
Raw data preserved?No (transformed before loading)Yes (raw data in bronze layer)
Schema flexibilitySchema-on-write (define first)Schema-on-read (define later)
ScalabilityLimited by transform serverScales with warehouse compute
Cost modelTransform server + warehouseWarehouse compute (pay per query)
Best forRegulated environments, small dataModern analytics, large data, ML
ToolsInformatica, Talend, SSISFivetran + dbt, Airbyte + dbt

Modern ELT Stack

Sources                    Extract & Load           Transform
┌──────────┐              ┌──────────────┐        ┌──────────┐
│ SaaS APIs ├──┐          │ Fivetran     │        │ dbt      │
│ Databases ├──┼─────────▶│ Airbyte      ├───────▶│          │
│ Files     ├──┘          │ Stitch       │  raw   │ SQL-based│
│ Events    │              └──────────────┘  data   │ transform│
└──────────┘                                       └────┬─────┘

                                              ┌─────────▼──────┐
                                              │ Data Warehouse   │
                                              │ (Snowflake/BQ/   │
                                              │  Redshift)       │
                                              └──────────────────┘

dbt Transformation Layer

-- models/staging/stg_orders.sql
WITH source AS (
    SELECT * FROM {{ source('commerce', 'orders') }}
),

renamed AS (
    SELECT
        id AS order_id,
        user_id AS customer_id,
        total_amount_cents / 100.0 AS amount,
        status,
        CAST(created_at AS TIMESTAMP) AS order_date,
        CAST(_fivetran_synced AS TIMESTAMP) AS loaded_at
    FROM source
    WHERE _fivetran_deleted = FALSE
)

SELECT * FROM renamed

-- models/marts/daily_revenue.sql
WITH orders AS (
    SELECT * FROM {{ ref('stg_orders') }}
    WHERE status NOT IN ('cancelled', 'refunded')
),

daily AS (
    SELECT
        DATE_TRUNC('day', order_date) AS date,
        COUNT(*) AS order_count,
        SUM(amount) AS revenue,
        AVG(amount) AS avg_order_value,
        COUNT(DISTINCT customer_id) AS unique_customers
    FROM orders
    GROUP BY 1
)

SELECT * FROM daily

Incremental Processing

StrategyDescriptionBest For
Full refreshDrop and rebuild entire tableSmall tables, infrequent changes
Incremental appendOnly process new rowsEvent logs, time-series
Incremental mergeUpsert changed rowsSlowly changing dimensions
SnapshotTrack history of all changesAuditing, SCD Type 2
-- dbt incremental model
{{ config(
    materialized='incremental',
    unique_key='order_id',
    incremental_strategy='merge',
    on_schema_change='append_new_columns'
) }}

SELECT
    order_id,
    customer_id,
    amount,
    status,
    updated_at
FROM {{ ref('stg_orders') }}

{% if is_incremental() %}
    WHERE updated_at > (SELECT MAX(updated_at) FROM {{ this }})
{% endif %}

Tool Comparison

ToolTypeBest ForPricing
FivetranManaged ELTeams wanting zero-maintenance connectorsPer-row
AirbyteOpen-source ELBudget-conscious, custom connectorsFree (self-hosted)
dbtTransform (T)SQL-based transformationsFree (Core) / Paid (Cloud)
InformaticaEnterprise ETLComplex transformations, governanceEnterprise license
Azure Data FactoryCloud ETL/ELTAzure-heavy environmentsPer-activity
AWS GlueServerless ETLAWS-native, Spark-basedPer-DPU-hour

Anti-Patterns

Anti-PatternProblemFix
Transform in extraction toolBusiness logic scattered across toolsUse dbt for all transformation logic
Full refresh everythingExpensive, slow for large tablesIncremental processing for tables > 1M rows
No staging layerBusiness logic applied directly to raw dataStage raw → clean → transform (medallion)
Hardcoded connectionsEnvironment-specific configs in codeEnvironment variables, config management
No lineageCan’t trace data from dashboard to sourcedbt docs + data catalog integration

Checklist

  • ELT or ETL pattern selected based on data volume and warehouse capability
  • Extract tools configured (Fivetran, Airbyte) with appropriate sync frequency
  • Staging layer: raw data preserved before transformation
  • dbt project: organized models with staging → intermediate → marts layers
  • Incremental processing for large tables (> 1M rows)
  • Testing: schema tests, data tests, freshness tests in dbt
  • Documentation: model descriptions, column descriptions
  • Monitoring: sync failures, transformation errors, freshness SLAs
  • Version control: all transformation logic in Git

:::note[Source] This guide is derived from operational intelligence at Garnet Grid Consulting. For data integration consulting, visit garnetgrid.com. :::

Jakub Dimitri Rezayev
Jakub Dimitri Rezayev
Founder & Chief Architect • Garnet Grid Consulting

Jakub holds an M.S. in Customer Intelligence & Analytics and a B.S. in Finance & Computer Science from Pace University. With deep expertise spanning D365 F&O, Azure, Power BI, and AI/ML systems, he architects enterprise solutions that bridge legacy systems and modern technology — and has led multi-million dollar ERP implementations for Fortune 500 supply chains.

View Full Profile →