Data Mesh Architecture
Decentralize data ownership using data mesh principles. Covers domain-oriented data ownership, data as a product, self-serve data infrastructure, federated governance, and the patterns that scale data systems with organizational growth.
Centralized data teams are bottlenecks. When every analytical query, every data pipeline, and every dashboard request flows through a single data engineering team, that team becomes the constraint on the entire organization’s ability to make data-driven decisions. Data mesh decentralizes data ownership to the domains that produce and understand the data.
Core Principles
1. Domain-Oriented Ownership:
Traditional: Central data team owns ALL data pipelines
Data Mesh: Each domain team owns their own data products
Example:
- Orders team owns order data products
- Payments team owns payment data products
- Marketing team owns campaign data products
2. Data as a Product:
Treat analytical data with the same rigor as production APIs:
☐ SLAs on freshness and quality
☐ Documentation and discoverability
☐ Versioned schemas
☐ Monitoring and alerting
☐ Product owner accountable
3. Self-Serve Data Platform:
Central platform team provides:
☐ Data pipeline infrastructure (managed, self-service)
☐ Storage and compute (data lake, warehouse)
☐ Quality tooling (Great Expectations, dbt tests)
☐ Data catalog (discovery, lineage)
Domain teams use the platform to build data products.
They don't ask the platform team to build pipelines.
4. Federated Computational Governance:
☐ Centrally defined policies (privacy, quality standards)
☐ Automated enforcement (policy-as-code)
☐ Domain teams comply without manual approval
Data Product Interface
class DataProduct:
"""A well-defined, discoverable data product."""
metadata = {
"name": "order_events",
"domain": "commerce",
"owner": "orders-team",
"description": "All order lifecycle events (created, paid, shipped, delivered)",
"sla": {
"freshness": "< 5 minutes",
"availability": "99.9%",
"quality_score": "> 95%",
},
"schema_version": "3.2.0",
"output_ports": [
{"type": "streaming", "format": "Kafka topic", "name": "orders.events.v3"},
{"type": "batch", "format": "Parquet", "path": "s3://data-lake/orders/events/"},
{"type": "api", "format": "REST", "url": "/api/data/orders/events"},
],
"classification": "internal",
"lineage": {
"sources": ["orders-db.public.orders", "payments-api"],
"transforms": ["order_enrichment_pipeline"],
},
}
def quality_checks(self):
return [
{"check": "no null order_id", "threshold": "100%"},
{"check": "valid status values", "threshold": "100%"},
{"check": "amount > 0", "threshold": "99.9%"},
{"check": "freshness < 5 min", "threshold": "99.9%"},
]
Anti-Patterns
| Anti-Pattern | Consequence | Fix |
|---|---|---|
| Data mesh without platform | Every team reinvents infrastructure | Self-serve platform is prerequisite |
| No data product standards | Inconsistent quality across domains | Federated governance with enforced standards |
| Domain teams refuse ownership | Data mesh in name only | Executive sponsorship, embedded data engineers |
| Every dataset is a “data product” | Catalog bloated, nothing discoverable | Curate: only well-documented, quality-checked data |
| Central team still builds pipelines | Bottleneck unchanged | True self-service, not “submit a request” |
Data mesh is an organizational pattern, not a technology. It works when domains genuinely own their data products, a self-serve platform reduces the engineering burden, and federated governance ensures interoperability. Without all three, it is just a rebranded data lake.