Data Testing & Data Quality Frameworks
Test data pipelines systematically. Covers Great Expectations, dbt tests, data profiling, anomaly detection, schema validation, and building a data quality SLA.
Data bugs are worse than code bugs. A code bug crashes the application — you know immediately. A data bug silently produces wrong numbers for weeks. Revenue reports are off, decisions are made on bad data, and nobody notices until the CFO asks “why do these numbers disagree?” Data testing catches these bugs before they become business disasters.
Data Quality Dimensions
| Dimension | What to Test | Example Check |
|---|---|---|
| Completeness | No missing required values | NOT NULL on customer_id |
| Uniqueness | No duplicates | Primary key uniqueness |
| Accuracy | Values match reality | Price > 0, age between 0-150 |
| Consistency | No contradictions | order_date <= ship_date |
| Timeliness | Data is fresh enough | Source updated within last 24h |
| Validity | Values match expected formats | Email matches pattern, ISO country codes |
dbt Tests
# schema.yml
models:
- name: fct_orders
description: "One row per order line item"
columns:
- name: order_id
tests:
- not_null
- unique
- name: customer_id
tests:
- not_null
- relationships:
to: ref('dim_customer')
field: customer_id
- name: total_amount
tests:
- not_null
- dbt_utils.accepted_range:
min_value: 0
max_value: 1000000
inclusive: true
- name: order_date
tests:
- not_null
- dbt_utils.expression_is_true:
expression: "<= current_date"
config:
severity: error
Custom dbt Test
-- tests/assert_orders_revenue_matches_payments.sql
-- Verify total orders revenue matches total payments
WITH order_total AS (
SELECT SUM(total_amount) AS total FROM {{ ref('fct_orders') }}
WHERE order_date >= CURRENT_DATE - INTERVAL '1 day'
),
payment_total AS (
SELECT SUM(amount) AS total FROM {{ ref('fct_payments') }}
WHERE payment_date >= CURRENT_DATE - INTERVAL '1 day'
)
SELECT
order_total.total AS order_revenue,
payment_total.total AS payment_revenue,
ABS(order_total.total - payment_total.total) AS difference
FROM order_total, payment_total
WHERE ABS(order_total.total - payment_total.total) > 0.01
-- Returns rows only if there's a mismatch (test fails)
Great Expectations
import great_expectations as gx
context = gx.get_context()
# Define expectations for orders table
suite = context.add_expectation_suite("orders_quality")
# Column expectations
suite.add_expectation(
gx.expectations.ExpectColumnValuesToNotBeNull(column="order_id")
)
suite.add_expectation(
gx.expectations.ExpectColumnValuesToBeUnique(column="order_id")
)
suite.add_expectation(
gx.expectations.ExpectColumnValuesToBeBetween(
column="total_amount",
min_value=0,
max_value=1000000,
)
)
# Row count expectations (catch data loss)
suite.add_expectation(
gx.expectations.ExpectTableRowCountToBeBetween(
min_value=1000, # At least 1000 orders per day
max_value=100000, # Sanity check upper bound
)
)
Anti-Patterns
| Anti-Pattern | Problem | Fix |
|---|---|---|
| No data tests | Bad data discovered by business users | dbt tests + Great Expectations on every model |
| Testing only schema | Data type correct but values wrong | Content-level tests (ranges, relationships, aggregates) |
| No row count checks | Source stopped sending data, nobody noticed | Min/max row count expectations per table |
| Tests only in prod | Bugs reach production | Test in CI before deploying to production |
| No data freshness monitoring | Stale data served in dashboards | Freshness tests with SLA alerts |
Checklist
- Data quality dimensions defined per dataset
- dbt tests on every model (not_null, unique, relationships)
- Custom tests for business rules (cross-table validation)
- Row count tests to detect data loss or duplication
- Freshness monitoring with SLA alerts
- Great Expectations (or equivalent) for source data validation
- Data quality dashboards (pass rate, failure trends)
- Incident response for data quality failures
:::note[Source] This guide is derived from operational intelligence at Garnet Grid Consulting. For data quality consulting, visit garnetgrid.com. :::