Data Testing & Data Quality Frameworks

Data bugs are worse than code bugs. A code bug crashes the application — you know immediately. A data bug silently produces wrong numbers for weeks. Revenue reports are off, decisions are made on bad data, and nobody notices until the CFO asks “why do these numbers disagree?” Data testing catches these bugs before they become business disasters.

Data Quality Dimensions

Dimension	What to Test	Example Check
Completeness	No missing required values	`NOT NULL` on customer_id
Uniqueness	No duplicates	Primary key uniqueness
Accuracy	Values match reality	Price > 0, age between 0-150
Consistency	No contradictions	order_date <= ship_date
Timeliness	Data is fresh enough	Source updated within last 24h
Validity	Values match expected formats	Email matches pattern, ISO country codes

dbt Tests

# schema.yml
models:
  - name: fct_orders
    description: "One row per order line item"
    columns:
      - name: order_id
        tests:
          - not_null
          - unique
      
      - name: customer_id
        tests:
          - not_null
          - relationships:
              to: ref('dim_customer')
              field: customer_id
      
      - name: total_amount
        tests:
          - not_null
          - dbt_utils.accepted_range:
              min_value: 0
              max_value: 1000000
              inclusive: true
      
      - name: order_date
        tests:
          - not_null
          - dbt_utils.expression_is_true:
              expression: "<= current_date"
              config:
                severity: error

Custom dbt Test

-- tests/assert_orders_revenue_matches_payments.sql
-- Verify total orders revenue matches total payments

WITH order_total AS (
    SELECT SUM(total_amount) AS total FROM {{ ref('fct_orders') }}
    WHERE order_date >= CURRENT_DATE - INTERVAL '1 day'
),
payment_total AS (
    SELECT SUM(amount) AS total FROM {{ ref('fct_payments') }}
    WHERE payment_date >= CURRENT_DATE - INTERVAL '1 day'
)

SELECT
    order_total.total AS order_revenue,
    payment_total.total AS payment_revenue,
    ABS(order_total.total - payment_total.total) AS difference
FROM order_total, payment_total
WHERE ABS(order_total.total - payment_total.total) > 0.01
-- Returns rows only if there's a mismatch (test fails)

Great Expectations

import great_expectations as gx

context = gx.get_context()

# Define expectations for orders table
suite = context.add_expectation_suite("orders_quality")

# Column expectations
suite.add_expectation(
    gx.expectations.ExpectColumnValuesToNotBeNull(column="order_id")
)
suite.add_expectation(
    gx.expectations.ExpectColumnValuesToBeUnique(column="order_id")
)
suite.add_expectation(
    gx.expectations.ExpectColumnValuesToBeBetween(
        column="total_amount",
        min_value=0,
        max_value=1000000,
    )
)

# Row count expectations (catch data loss)
suite.add_expectation(
    gx.expectations.ExpectTableRowCountToBeBetween(
        min_value=1000,   # At least 1000 orders per day
        max_value=100000,  # Sanity check upper bound
    )
)

Anti-Patterns

Anti-Pattern	Problem	Fix
No data tests	Bad data discovered by business users	dbt tests + Great Expectations on every model
Testing only schema	Data type correct but values wrong	Content-level tests (ranges, relationships, aggregates)
No row count checks	Source stopped sending data, nobody noticed	Min/max row count expectations per table
Tests only in prod	Bugs reach production	Test in CI before deploying to production
No data freshness monitoring	Stale data served in dashboards	Freshness tests with SLA alerts

Checklist

Data quality dimensions defined per dataset
dbt tests on every model (not_null, unique, relationships)
Custom tests for business rules (cross-table validation)
Row count tests to detect data loss or duplication
Freshness monitoring with SLA alerts
Great Expectations (or equivalent) for source data validation
Data quality dashboards (pass rate, failure trends)
Incident response for data quality failures

:::note[Source] This guide is derived from operational intelligence at Garnet Grid Consulting. For data quality consulting, visit garnetgrid.com. :::

Data Quality Dimensions

dbt Tests

Custom dbt Test

Great Expectations

Anti-Patterns

Checklist

More in Data Engineering

CDC Pipeline Architecture

Change Data Capture (CDC) Patterns

Batch Processing at Scale