Data Warehouse Modeling with Kimball

Dimensional modeling is the foundation of every analytics warehouse. The Kimball methodology organizes data into facts (things that happened) and dimensions (context about what happened). This structure makes queries intuitive — business users can slice and dice facts by any dimension without understanding joins or database internals.

Star Schema

              ┌──────────────┐
              │ dim_customer  │
              │ customer_key  │
              │ name          │
              │ segment       │
              │ region        │
              └──────┬───────┘
                     │
┌──────────────┐     │     ┌──────────────┐
│ dim_product   │     │     │ dim_date      │
│ product_key   │     │     │ date_key      │
│ name          ├─────┤     │ date          │
│ category      │     │     │ month         │
│ brand         │     │     │ quarter       │
└──────────────┘     │     │ year          │
                     │     │ is_holiday    │
              ┌──────┴──────┴──────┐
              │    fct_orders       │
              │ order_key           │
              │ customer_key (FK)   │
              │ product_key  (FK)   │
              │ date_key     (FK)   │
              │ quantity            │
              │ unit_price          │
              │ discount            │
              │ total_amount        │
              └─────────────────────┘

Fact Table Types

Type	What It Stores	Grain	Example
Transaction	Individual events	One row per event	Each order line item
Periodic snapshot	State at regular intervals	One row per period	Monthly account balance
Accumulating snapshot	Lifecycle milestones	One row per entity	Order: created → shipped → delivered
Factless fact	Event occurrence (no measures)	One row per event	Student attendance, page views

Slowly Changing Dimensions (SCD)

SCD Type	How It Works	When to Use
Type 1	Overwrite old value	Don’t need history (fix typos)
Type 2	New row with version dates	Need full history (address changes)
Type 3	Add column for previous value	Need current + one previous
Type 6	Combines Types 1, 2, 3	Need history + current flag

SCD Type 2 Implementation

-- dim_customer with SCD Type 2
CREATE TABLE dim_customer (
    customer_key    SERIAL PRIMARY KEY,  -- Surrogate key
    customer_id     INT NOT NULL,        -- Natural key
    name            TEXT NOT NULL,
    segment         TEXT NOT NULL,
    region          TEXT NOT NULL,
    effective_date  DATE NOT NULL,
    expiry_date     DATE DEFAULT '9999-12-31',
    is_current      BOOLEAN DEFAULT TRUE
);

-- Customer moves from "SMB" to "Enterprise"
-- Step 1: Expire current record
UPDATE dim_customer 
SET expiry_date = CURRENT_DATE - 1, is_current = FALSE
WHERE customer_id = 123 AND is_current = TRUE;

-- Step 2: Insert new record
INSERT INTO dim_customer (customer_id, name, segment, region, effective_date)
VALUES (123, 'Acme Corp', 'Enterprise', 'US-West', CURRENT_DATE);

dbt Implementation

-- models/marts/core/fct_orders.sql
{{
  config(
    materialized='incremental',
    unique_key='order_key',
    on_schema_change='append_new_columns'
  )
}}

WITH orders AS (
    SELECT * FROM {{ ref('stg_orders') }}
    {% if is_incremental() %}
    WHERE updated_at > (SELECT MAX(updated_at) FROM {{ this }})
    {% endif %}
),

customers AS (
    SELECT * FROM {{ ref('dim_customer') }}
    WHERE is_current = TRUE
),

products AS (
    SELECT * FROM {{ ref('dim_product') }}
),

date_dim AS (
    SELECT * FROM {{ ref('dim_date') }}
)

SELECT
    {{ dbt_utils.generate_surrogate_key(['orders.order_id', 'orders.line_number']) }} AS order_key,
    customers.customer_key,
    products.product_key,
    date_dim.date_key,
    orders.quantity,
    orders.unit_price,
    orders.discount,
    orders.quantity * orders.unit_price * (1 - orders.discount) AS total_amount,
    orders.updated_at

FROM orders
LEFT JOIN customers ON orders.customer_id = customers.customer_id
LEFT JOIN products ON orders.product_id = products.product_id
LEFT JOIN date_dim ON orders.order_date = date_dim.date

Anti-Patterns

Anti-Pattern	Problem	Fix
Natural keys as fact table keys	Slow joins, data quality issues	Surrogate integer keys for all dimensions
Snowflaking everything	Complex queries, poor performance	Star schema (denormalize dimensions)
No date dimension	Can’t slice by fiscal year, holiday, etc.	Pre-built date dimension with business attributes
One Big Table (OBT)	Redundant data, hard to maintain	Proper star schema with fact + dimensions
No grain documentation	Ambiguous what each row represents	Document grain for every fact table

Checklist

Grain defined for every fact table
Surrogate keys on all dimension tables
Date dimension built with fiscal/business attributes
SCD strategy chosen per dimension (Type 1, 2, or 3)
Conformed dimensions shared across fact tables
Incremental loading for large fact tables
Data quality tests on all models
Documentation: grain, business definitions, lineage

:::note[Source] This guide is derived from operational intelligence at Garnet Grid Consulting. For data modeling consulting, visit garnetgrid.com. :::

Star Schema

Fact Table Types

Slowly Changing Dimensions (SCD)

SCD Type 2 Implementation

dbt Implementation

Anti-Patterns

Checklist

More in Data Engineering

CDC Pipeline Architecture

Change Data Capture (CDC) Patterns

Batch Processing at Scale