Data Mesh Architecture | The Garnet Wiki

The centralized data team bottleneck is real: one team of 5 data engineers serving 200 engineers across 20 product teams. Requests queue for weeks. Data quality degrades because the data team does not understand domain nuances. Data mesh proposes a radical alternative: treat data as a product, owned by the domain teams that produce it, with a self-serve platform and federated governance.

Data Mesh Principles

Principle 1: Domain-Oriented Ownership
  Old: Central data team owns all data
  New: Each domain team owns its data products
  
  Commerce team → owns Orders data product
  Marketing team → owns Campaign data product
  Finance team → owns Revenue data product
  
  The team that produces the data is responsible for:
  ☐ Data quality
  ☐ SLAs (freshness, availability)
  ☐ Schema documentation
  ☐ Consumer support

Principle 2: Data as a Product
  Treat data consumers as customers:
  ☐ Discoverability: Can consumers find the data?
  ☐ Understandability: Is it documented?
  ☐ Trustworthiness: Is quality measured and guaranteed?
  ☐ Accessibility: Is it self-serve with standard interfaces?
  ☐ Interoperability: Does it follow organizational standards?
  ☐ Security: Is access controlled appropriately?

Principle 3: Self-Serve Data Platform
  Platform team provides the infrastructure:
  ☐ Storage (data lake, warehouse)
  ☐ Compute (transformation engines)
  ☐ Catalog (metadata, discovery)
  ☐ Quality (automated testing frameworks)
  ☐ Access control (policy engine)
  
  Domain teams should NOT build infrastructure.
  They should use the platform to publish their products.

Principle 4: Federated Computational Governance
  Global standards, local implementation:
  ☐ Naming conventions (enforced by platform)
  ☐ Quality standards (automated quality gates)
  ☐ Access policies (centralized identity, decentralized decisions)
  ☐ Interoperability (shared schema registry)

Data Product Definition

# data-product.yaml — published by domain team
apiVersion: datamesh/v1
kind: DataProduct
metadata:
  name: orders
  domain: commerce
  owner: team-commerce
  tier: gold  # gold = production quality
  
spec:
  description: "All customer orders with line items and fulfillment status"
  
  schema:
    format: parquet
    location: s3://data-products/commerce/orders/
    
  sla:
    freshness: 1_hour
    availability: 99.9%
    
  quality:
    tests:
      - type: uniqueness
        column: order_id
      - type: not_null
        columns: [order_id, customer_id, created_at, total]
      - type: range
        column: total
        min: 0
        max: 1000000
      - type: freshness
        column: updated_at
        max_age: 1_hour
    
  access:
    classification: internal
    approved_consumers: ["analytics", "marketing", "finance"]
    pii_columns: [customer_email, shipping_address]

Anti-Patterns

Anti-Pattern	Consequence	Fix
Decentralize without platform	Each domain builds its own infrastructure	Self-serve platform must exist BEFORE decentralizing
No governance standards	Incompatible schemas, naming chaos	Federated governance: global standards, local implementation
Domain teams lack data skills	Poor quality data products	Embed data engineers in domain teams, provide training
Data mesh for small organizations	Overhead > benefit for < 50 engineers	Data mesh solves SCALE problems, start when needed
No data product SLAs	Consumers cannot rely on data freshness	Every data product publishes freshness and quality SLAs

Data mesh is an organizational pattern, not a technology. It works when the bottleneck is organizational — when one central team cannot serve all data needs. For smaller organizations, a well-run central data team is simpler and sufficient.

Data Mesh Principles

Data Product Definition

Anti-Patterns

More in Data Engineering

CDC Pipeline Architecture

Change Data Capture (CDC) Patterns

Batch Processing at Scale