Data Warehouse Modeling with Kimball
Design dimensional data warehouses. Covers star schema, snowflake schema, fact and dimension tables, slowly changing dimensions, conformed dimensions, and ETL design patterns.
Dimensional modeling is the foundation of every analytics warehouse. The Kimball methodology organizes data into facts (things that happened) and dimensions (context about what happened). This structure makes queries intuitive — business users can slice and dice facts by any dimension without understanding joins or database internals.
Star Schema
┌──────────────┐
│ dim_customer │
│ customer_key │
│ name │
│ segment │
│ region │
└──────┬───────┘
│
┌──────────────┐ │ ┌──────────────┐
│ dim_product │ │ │ dim_date │
│ product_key │ │ │ date_key │
│ name ├─────┤ │ date │
│ category │ │ │ month │
│ brand │ │ │ quarter │
└──────────────┘ │ │ year │
│ │ is_holiday │
┌──────┴──────┴──────┐
│ fct_orders │
│ order_key │
│ customer_key (FK) │
│ product_key (FK) │
│ date_key (FK) │
│ quantity │
│ unit_price │
│ discount │
│ total_amount │
└─────────────────────┘
Fact Table Types
| Type | What It Stores | Grain | Example |
|---|---|---|---|
| Transaction | Individual events | One row per event | Each order line item |
| Periodic snapshot | State at regular intervals | One row per period | Monthly account balance |
| Accumulating snapshot | Lifecycle milestones | One row per entity | Order: created → shipped → delivered |
| Factless fact | Event occurrence (no measures) | One row per event | Student attendance, page views |
Slowly Changing Dimensions (SCD)
| SCD Type | How It Works | When to Use |
|---|---|---|
| Type 1 | Overwrite old value | Don’t need history (fix typos) |
| Type 2 | New row with version dates | Need full history (address changes) |
| Type 3 | Add column for previous value | Need current + one previous |
| Type 6 | Combines Types 1, 2, 3 | Need history + current flag |
SCD Type 2 Implementation
-- dim_customer with SCD Type 2
CREATE TABLE dim_customer (
customer_key SERIAL PRIMARY KEY, -- Surrogate key
customer_id INT NOT NULL, -- Natural key
name TEXT NOT NULL,
segment TEXT NOT NULL,
region TEXT NOT NULL,
effective_date DATE NOT NULL,
expiry_date DATE DEFAULT '9999-12-31',
is_current BOOLEAN DEFAULT TRUE
);
-- Customer moves from "SMB" to "Enterprise"
-- Step 1: Expire current record
UPDATE dim_customer
SET expiry_date = CURRENT_DATE - 1, is_current = FALSE
WHERE customer_id = 123 AND is_current = TRUE;
-- Step 2: Insert new record
INSERT INTO dim_customer (customer_id, name, segment, region, effective_date)
VALUES (123, 'Acme Corp', 'Enterprise', 'US-West', CURRENT_DATE);
dbt Implementation
-- models/marts/core/fct_orders.sql
{{
config(
materialized='incremental',
unique_key='order_key',
on_schema_change='append_new_columns'
)
}}
WITH orders AS (
SELECT * FROM {{ ref('stg_orders') }}
{% if is_incremental() %}
WHERE updated_at > (SELECT MAX(updated_at) FROM {{ this }})
{% endif %}
),
customers AS (
SELECT * FROM {{ ref('dim_customer') }}
WHERE is_current = TRUE
),
products AS (
SELECT * FROM {{ ref('dim_product') }}
),
date_dim AS (
SELECT * FROM {{ ref('dim_date') }}
)
SELECT
{{ dbt_utils.generate_surrogate_key(['orders.order_id', 'orders.line_number']) }} AS order_key,
customers.customer_key,
products.product_key,
date_dim.date_key,
orders.quantity,
orders.unit_price,
orders.discount,
orders.quantity * orders.unit_price * (1 - orders.discount) AS total_amount,
orders.updated_at
FROM orders
LEFT JOIN customers ON orders.customer_id = customers.customer_id
LEFT JOIN products ON orders.product_id = products.product_id
LEFT JOIN date_dim ON orders.order_date = date_dim.date
Anti-Patterns
| Anti-Pattern | Problem | Fix |
|---|---|---|
| Natural keys as fact table keys | Slow joins, data quality issues | Surrogate integer keys for all dimensions |
| Snowflaking everything | Complex queries, poor performance | Star schema (denormalize dimensions) |
| No date dimension | Can’t slice by fiscal year, holiday, etc. | Pre-built date dimension with business attributes |
| One Big Table (OBT) | Redundant data, hard to maintain | Proper star schema with fact + dimensions |
| No grain documentation | Ambiguous what each row represents | Document grain for every fact table |
Checklist
- Grain defined for every fact table
- Surrogate keys on all dimension tables
- Date dimension built with fiscal/business attributes
- SCD strategy chosen per dimension (Type 1, 2, or 3)
- Conformed dimensions shared across fact tables
- Incremental loading for large fact tables
- Data quality tests on all models
- Documentation: grain, business definitions, lineage
:::note[Source] This guide is derived from operational intelligence at Garnet Grid Consulting. For data modeling consulting, visit garnetgrid.com. :::