Data Mesh Architecture
Decentralize data ownership for organizational scale. Covers domain-oriented data products, self-serve data platform, federated governance, and the patterns that distribute data responsibility to the teams that know the data best.
The centralized data team bottleneck is real: one team of 5 data engineers serving 200 engineers across 20 product teams. Requests queue for weeks. Data quality degrades because the data team does not understand domain nuances. Data mesh proposes a radical alternative: treat data as a product, owned by the domain teams that produce it, with a self-serve platform and federated governance.
Data Mesh Principles
Principle 1: Domain-Oriented Ownership
Old: Central data team owns all data
New: Each domain team owns its data products
Commerce team → owns Orders data product
Marketing team → owns Campaign data product
Finance team → owns Revenue data product
The team that produces the data is responsible for:
☐ Data quality
☐ SLAs (freshness, availability)
☐ Schema documentation
☐ Consumer support
Principle 2: Data as a Product
Treat data consumers as customers:
☐ Discoverability: Can consumers find the data?
☐ Understandability: Is it documented?
☐ Trustworthiness: Is quality measured and guaranteed?
☐ Accessibility: Is it self-serve with standard interfaces?
☐ Interoperability: Does it follow organizational standards?
☐ Security: Is access controlled appropriately?
Principle 3: Self-Serve Data Platform
Platform team provides the infrastructure:
☐ Storage (data lake, warehouse)
☐ Compute (transformation engines)
☐ Catalog (metadata, discovery)
☐ Quality (automated testing frameworks)
☐ Access control (policy engine)
Domain teams should NOT build infrastructure.
They should use the platform to publish their products.
Principle 4: Federated Computational Governance
Global standards, local implementation:
☐ Naming conventions (enforced by platform)
☐ Quality standards (automated quality gates)
☐ Access policies (centralized identity, decentralized decisions)
☐ Interoperability (shared schema registry)
Data Product Definition
# data-product.yaml — published by domain team
apiVersion: datamesh/v1
kind: DataProduct
metadata:
name: orders
domain: commerce
owner: team-commerce
tier: gold # gold = production quality
spec:
description: "All customer orders with line items and fulfillment status"
schema:
format: parquet
location: s3://data-products/commerce/orders/
sla:
freshness: 1_hour
availability: 99.9%
quality:
tests:
- type: uniqueness
column: order_id
- type: not_null
columns: [order_id, customer_id, created_at, total]
- type: range
column: total
min: 0
max: 1000000
- type: freshness
column: updated_at
max_age: 1_hour
access:
classification: internal
approved_consumers: ["analytics", "marketing", "finance"]
pii_columns: [customer_email, shipping_address]
Anti-Patterns
| Anti-Pattern | Consequence | Fix |
|---|---|---|
| Decentralize without platform | Each domain builds its own infrastructure | Self-serve platform must exist BEFORE decentralizing |
| No governance standards | Incompatible schemas, naming chaos | Federated governance: global standards, local implementation |
| Domain teams lack data skills | Poor quality data products | Embed data engineers in domain teams, provide training |
| Data mesh for small organizations | Overhead > benefit for < 50 engineers | Data mesh solves SCALE problems, start when needed |
| No data product SLAs | Consumers cannot rely on data freshness | Every data product publishes freshness and quality SLAs |
Data mesh is an organizational pattern, not a technology. It works when the bottleneck is organizational — when one central team cannot serve all data needs. For smaller organizations, a well-run central data team is simpler and sufficient.