Data Mesh: Decentralized Data Architecture

Data mesh flips the centralized data team model on its head. Instead of one data engineering team building pipelines for the entire company, domain teams own their data end-to-end — from production databases through to analytical data products. The central team provides self-serve infrastructure, not data.

This guide covers how to implement data mesh practically, including where teams commonly fail and how to avoid it.

Four Principles

┌──────────────────────────────────────────────┐
│                Data Mesh                      │
│                                              │
│  ┌───────────┐        ┌──────────────────┐   │
│  │ Domain    │        │ Data as a        │   │
│  │ Ownership │        │ Product          │   │
│  │           │        │                  │   │
│  │ Teams own │        │ Discoverable,    │   │
│  │ their     │        │ addressable,     │   │
│  │ data      │        │ trustworthy      │   │
│  └───────────┘        └──────────────────┘   │
│                                              │
│  ┌───────────┐        ┌──────────────────┐   │
│  │ Self-Serve│        │ Federated        │   │
│  │ Data      │        │ Computational    │   │
│  │ Platform  │        │ Governance       │   │
│  │           │        │                  │   │
│  │ Infra as  │        │ Global standards,│   │
│  │ platform  │        │ local autonomy   │   │
│  └───────────┘        └──────────────────┘   │
└──────────────────────────────────────────────┘

Domain Ownership

Before Data Mesh: Centralized

Commerce Team → raw data → Central Data Team → dashboards
Product Team  → raw data → Central Data Team → ML features
Finance Team  → raw data → Central Data Team → reports
                              ↑
                           Bottleneck

After Data Mesh: Decentralized

Commerce Domain
  ├── Source: orders database
  ├── Data Product: "order_events" (streaming)
  ├── Data Product: "daily_order_metrics" (batch)
  └── Owner: Commerce Data Engineer

Product Domain
  ├── Source: product catalog, clickstream
  ├── Data Product: "product_engagement" (batch)
  └── Owner: Product Analytics Engineer

Finance Domain
  ├── Consumes: order_events, product_engagement
  ├── Data Product: "revenue_reports" (batch)
  └── Owner: Finance Data Analyst

Data as a Product

A data product has the same quality expectations as a software product:

data_product:
  name: "daily_order_metrics"
  domain: "commerce"
  owner: "commerce-analytics"
  
  metadata:
    description: "Daily aggregated order metrics by category and region"
    documentation: "https://data-catalog.internal/commerce/daily_order_metrics"
    schema_version: "2.3.0"
    update_frequency: "daily at 06:00 UTC"
    
  quality:
    sla_availability: "99.5%"
    sla_freshness: "< 2 hours from midnight UTC"
    tests:
      - row_count_minimum: 100
      - no_null_primary_keys: true
      - revenue_matches_source_within: "2%"
    
  access:
    discovery: "data-catalog"        # How to find it
    access_method: "BigQuery view"   # How to consume it
    output_ports:
      - type: "sql"
        location: "analytics.commerce.daily_order_metrics"
      - type: "api"
        endpoint: "https://data-api.internal/v1/order-metrics"
    
  lineage:
    sources:
      - "commerce.orders"
      - "commerce.order_items"
      - "product.products"
    consumers:
      - "finance.revenue_reports"
      - "executive.kpi_dashboard"

Self-Serve Data Platform

What the platform team provides:

Capability	Platform Provides	Domain Teams Do
Storage	Data lake/warehouse infrastructure	Define schemas, manage data
Pipelines	Orchestration framework (Airflow, Dagster)	Write transformation logic
Quality	Validation framework (Great Expectations, dbt)	Define quality rules
Catalog	Data catalog (DataHub, Atlan)	Register data products
Access	Access management infrastructure	Define access policies
Monitoring	Observability platform	Set alerts and SLAs

Federated Governance

Global Standards (Central)	Local Autonomy (Domains)
Naming conventions	Table and column names (within standards)
PII classification rules	Which columns in their data are PII
Quality testing framework	Specific quality thresholds
Schema registry	Schema design and evolution
Access control framework	Who gets access to their data
Data retention policies	How long to retain their specific data

Anti-Patterns

Anti-Pattern	Problem	Fix
Mesh without platform	Every team builds infrastructure from scratch	Build self-serve platform first
Domain teams without data skills	Teams can’t own what they can’t build	Embedded data engineers or training
Too many data products	Governance overhead exceeds value	Start with critical 10-20 products
No interoperability standards	Domain products can’t be joined	Federated governance: common identifiers
”Mesh” as renaming	Rename teams but keep centralized process	Actual ownership transfer with accountability
Ignoring the journey	Jump to full mesh day one	Incremental: 1-2 domains first, scale after learning

Checklist

Domains identified with clear data ownership boundaries
Data products defined with SLAs, schema, quality guarantees
Self-serve platform: pipeline, storage, catalog, quality
Federated governance: global standards + local autonomy
Data catalog: all products discoverable and documented
Quality monitoring: automated tests per data product
Pilot: 1-2 domains fully implemented as proof of concept
Organizational alignment: leadership supports decentralization
Data literacy: domain teams skilled in data engineering

:::note[Source] This guide is derived from operational intelligence at Garnet Grid Consulting. For data mesh consulting, visit garnetgrid.com. :::