ETL (Extract, Transform, Load) transforms data before loading it into the target. ELT (Extract, Load, Transform) loads raw data first, then transforms it inside the target system. The shift from ETL to ELT happened because cloud data warehouses became powerful enough to run the transformations that used to require dedicated ETL servers.
ETL vs ELT
| Aspect | ETL | ELT |
|---|
| Transform where? | Separate transform server | Inside the warehouse |
| Raw data preserved? | No (transformed before loading) | Yes (raw data in bronze layer) |
| Schema flexibility | Schema-on-write (define first) | Schema-on-read (define later) |
| Scalability | Limited by transform server | Scales with warehouse compute |
| Cost model | Transform server + warehouse | Warehouse compute (pay per query) |
| Best for | Regulated environments, small data | Modern analytics, large data, ML |
| Tools | Informatica, Talend, SSIS | Fivetran + dbt, Airbyte + dbt |
Modern ELT Stack
Sources Extract & Load Transform
┌──────────┐ ┌──────────────┐ ┌──────────┐
│ SaaS APIs ├──┐ │ Fivetran │ │ dbt │
│ Databases ├──┼─────────▶│ Airbyte ├───────▶│ │
│ Files ├──┘ │ Stitch │ raw │ SQL-based│
│ Events │ └──────────────┘ data │ transform│
└──────────┘ └────┬─────┘
│
┌─────────▼──────┐
│ Data Warehouse │
│ (Snowflake/BQ/ │
│ Redshift) │
└──────────────────┘
-- models/staging/stg_orders.sql
WITH source AS (
SELECT * FROM {{ source('commerce', 'orders') }}
),
renamed AS (
SELECT
id AS order_id,
user_id AS customer_id,
total_amount_cents / 100.0 AS amount,
status,
CAST(created_at AS TIMESTAMP) AS order_date,
CAST(_fivetran_synced AS TIMESTAMP) AS loaded_at
FROM source
WHERE _fivetran_deleted = FALSE
)
SELECT * FROM renamed
-- models/marts/daily_revenue.sql
WITH orders AS (
SELECT * FROM {{ ref('stg_orders') }}
WHERE status NOT IN ('cancelled', 'refunded')
),
daily AS (
SELECT
DATE_TRUNC('day', order_date) AS date,
COUNT(*) AS order_count,
SUM(amount) AS revenue,
AVG(amount) AS avg_order_value,
COUNT(DISTINCT customer_id) AS unique_customers
FROM orders
GROUP BY 1
)
SELECT * FROM daily
Incremental Processing
| Strategy | Description | Best For |
|---|
| Full refresh | Drop and rebuild entire table | Small tables, infrequent changes |
| Incremental append | Only process new rows | Event logs, time-series |
| Incremental merge | Upsert changed rows | Slowly changing dimensions |
| Snapshot | Track history of all changes | Auditing, SCD Type 2 |
-- dbt incremental model
{{ config(
materialized='incremental',
unique_key='order_id',
incremental_strategy='merge',
on_schema_change='append_new_columns'
) }}
SELECT
order_id,
customer_id,
amount,
status,
updated_at
FROM {{ ref('stg_orders') }}
{% if is_incremental() %}
WHERE updated_at > (SELECT MAX(updated_at) FROM {{ this }})
{% endif %}
| Tool | Type | Best For | Pricing |
|---|
| Fivetran | Managed EL | Teams wanting zero-maintenance connectors | Per-row |
| Airbyte | Open-source EL | Budget-conscious, custom connectors | Free (self-hosted) |
| dbt | Transform (T) | SQL-based transformations | Free (Core) / Paid (Cloud) |
| Informatica | Enterprise ETL | Complex transformations, governance | Enterprise license |
| Azure Data Factory | Cloud ETL/ELT | Azure-heavy environments | Per-activity |
| AWS Glue | Serverless ETL | AWS-native, Spark-based | Per-DPU-hour |
Anti-Patterns
| Anti-Pattern | Problem | Fix |
|---|
| Transform in extraction tool | Business logic scattered across tools | Use dbt for all transformation logic |
| Full refresh everything | Expensive, slow for large tables | Incremental processing for tables > 1M rows |
| No staging layer | Business logic applied directly to raw data | Stage raw → clean → transform (medallion) |
| Hardcoded connections | Environment-specific configs in code | Environment variables, config management |
| No lineage | Can’t trace data from dashboard to source | dbt docs + data catalog integration |
Checklist
:::note[Source]
This guide is derived from operational intelligence at Garnet Grid Consulting. For data integration consulting, visit garnetgrid.com.
:::