Data Mesh: Decentralized Data Architecture
Implement data mesh principles. Covers domain ownership, data as a product, self-serve data platform, federated governance, and the organizational shift from centralized to decentralized data architecture.
Data mesh flips the centralized data team model on its head. Instead of one data engineering team building pipelines for the entire company, domain teams own their data end-to-end — from production databases through to analytical data products. The central team provides self-serve infrastructure, not data.
This guide covers how to implement data mesh practically, including where teams commonly fail and how to avoid it.
Four Principles
┌──────────────────────────────────────────────┐
│ Data Mesh │
│ │
│ ┌───────────┐ ┌──────────────────┐ │
│ │ Domain │ │ Data as a │ │
│ │ Ownership │ │ Product │ │
│ │ │ │ │ │
│ │ Teams own │ │ Discoverable, │ │
│ │ their │ │ addressable, │ │
│ │ data │ │ trustworthy │ │
│ └───────────┘ └──────────────────┘ │
│ │
│ ┌───────────┐ ┌──────────────────┐ │
│ │ Self-Serve│ │ Federated │ │
│ │ Data │ │ Computational │ │
│ │ Platform │ │ Governance │ │
│ │ │ │ │ │
│ │ Infra as │ │ Global standards,│ │
│ │ platform │ │ local autonomy │ │
│ └───────────┘ └──────────────────┘ │
└──────────────────────────────────────────────┘
Domain Ownership
Before Data Mesh: Centralized
Commerce Team → raw data → Central Data Team → dashboards
Product Team → raw data → Central Data Team → ML features
Finance Team → raw data → Central Data Team → reports
↑
Bottleneck
After Data Mesh: Decentralized
Commerce Domain
├── Source: orders database
├── Data Product: "order_events" (streaming)
├── Data Product: "daily_order_metrics" (batch)
└── Owner: Commerce Data Engineer
Product Domain
├── Source: product catalog, clickstream
├── Data Product: "product_engagement" (batch)
└── Owner: Product Analytics Engineer
Finance Domain
├── Consumes: order_events, product_engagement
├── Data Product: "revenue_reports" (batch)
└── Owner: Finance Data Analyst
Data as a Product
A data product has the same quality expectations as a software product:
data_product:
name: "daily_order_metrics"
domain: "commerce"
owner: "commerce-analytics"
metadata:
description: "Daily aggregated order metrics by category and region"
documentation: "https://data-catalog.internal/commerce/daily_order_metrics"
schema_version: "2.3.0"
update_frequency: "daily at 06:00 UTC"
quality:
sla_availability: "99.5%"
sla_freshness: "< 2 hours from midnight UTC"
tests:
- row_count_minimum: 100
- no_null_primary_keys: true
- revenue_matches_source_within: "2%"
access:
discovery: "data-catalog" # How to find it
access_method: "BigQuery view" # How to consume it
output_ports:
- type: "sql"
location: "analytics.commerce.daily_order_metrics"
- type: "api"
endpoint: "https://data-api.internal/v1/order-metrics"
lineage:
sources:
- "commerce.orders"
- "commerce.order_items"
- "product.products"
consumers:
- "finance.revenue_reports"
- "executive.kpi_dashboard"
Self-Serve Data Platform
What the platform team provides:
| Capability | Platform Provides | Domain Teams Do |
|---|---|---|
| Storage | Data lake/warehouse infrastructure | Define schemas, manage data |
| Pipelines | Orchestration framework (Airflow, Dagster) | Write transformation logic |
| Quality | Validation framework (Great Expectations, dbt) | Define quality rules |
| Catalog | Data catalog (DataHub, Atlan) | Register data products |
| Access | Access management infrastructure | Define access policies |
| Monitoring | Observability platform | Set alerts and SLAs |
Federated Governance
| Global Standards (Central) | Local Autonomy (Domains) |
|---|---|
| Naming conventions | Table and column names (within standards) |
| PII classification rules | Which columns in their data are PII |
| Quality testing framework | Specific quality thresholds |
| Schema registry | Schema design and evolution |
| Access control framework | Who gets access to their data |
| Data retention policies | How long to retain their specific data |
Anti-Patterns
| Anti-Pattern | Problem | Fix |
|---|---|---|
| Mesh without platform | Every team builds infrastructure from scratch | Build self-serve platform first |
| Domain teams without data skills | Teams can’t own what they can’t build | Embedded data engineers or training |
| Too many data products | Governance overhead exceeds value | Start with critical 10-20 products |
| No interoperability standards | Domain products can’t be joined | Federated governance: common identifiers |
| ”Mesh” as renaming | Rename teams but keep centralized process | Actual ownership transfer with accountability |
| Ignoring the journey | Jump to full mesh day one | Incremental: 1-2 domains first, scale after learning |
Checklist
- Domains identified with clear data ownership boundaries
- Data products defined with SLAs, schema, quality guarantees
- Self-serve platform: pipeline, storage, catalog, quality
- Federated governance: global standards + local autonomy
- Data catalog: all products discoverable and documented
- Quality monitoring: automated tests per data product
- Pilot: 1-2 domains fully implemented as proof of concept
- Organizational alignment: leadership supports decentralization
- Data literacy: domain teams skilled in data engineering
:::note[Source] This guide is derived from operational intelligence at Garnet Grid Consulting. For data mesh consulting, visit garnetgrid.com. :::