Data Mesh Architecture | The Garnet Wiki

Centralized data teams are bottlenecks. When every analytical query, every data pipeline, and every dashboard request flows through a single data engineering team, that team becomes the constraint on the entire organization’s ability to make data-driven decisions. Data mesh decentralizes data ownership to the domains that produce and understand the data.

Core Principles

1. Domain-Oriented Ownership:
   Traditional: Central data team owns ALL data pipelines
   Data Mesh: Each domain team owns their own data products
   
   Example:
   - Orders team owns order data products
   - Payments team owns payment data products
   - Marketing team owns campaign data products

2. Data as a Product:
   Treat analytical data with the same rigor as production APIs:
   ☐ SLAs on freshness and quality
   ☐ Documentation and discoverability
   ☐ Versioned schemas
   ☐ Monitoring and alerting
   ☐ Product owner accountable

3. Self-Serve Data Platform:
   Central platform team provides:
   ☐ Data pipeline infrastructure (managed, self-service)
   ☐ Storage and compute (data lake, warehouse)
   ☐ Quality tooling (Great Expectations, dbt tests)
   ☐ Data catalog (discovery, lineage)
   
   Domain teams use the platform to build data products.
   They don't ask the platform team to build pipelines.

4. Federated Computational Governance:
   ☐ Centrally defined policies (privacy, quality standards)
   ☐ Automated enforcement (policy-as-code)
   ☐ Domain teams comply without manual approval

Data Product Interface

class DataProduct:
    """A well-defined, discoverable data product."""
    
    metadata = {
        "name": "order_events",
        "domain": "commerce",
        "owner": "orders-team",
        "description": "All order lifecycle events (created, paid, shipped, delivered)",
        "sla": {
            "freshness": "< 5 minutes",
            "availability": "99.9%",
            "quality_score": "> 95%",
        },
        "schema_version": "3.2.0",
        "output_ports": [
            {"type": "streaming", "format": "Kafka topic", "name": "orders.events.v3"},
            {"type": "batch", "format": "Parquet", "path": "s3://data-lake/orders/events/"},
            {"type": "api", "format": "REST", "url": "/api/data/orders/events"},
        ],
        "classification": "internal",
        "lineage": {
            "sources": ["orders-db.public.orders", "payments-api"],
            "transforms": ["order_enrichment_pipeline"],
        },
    }
    
    def quality_checks(self):
        return [
            {"check": "no null order_id", "threshold": "100%"},
            {"check": "valid status values", "threshold": "100%"},
            {"check": "amount > 0", "threshold": "99.9%"},
            {"check": "freshness < 5 min", "threshold": "99.9%"},
        ]

Anti-Patterns

Anti-Pattern	Consequence	Fix
Data mesh without platform	Every team reinvents infrastructure	Self-serve platform is prerequisite
No data product standards	Inconsistent quality across domains	Federated governance with enforced standards
Domain teams refuse ownership	Data mesh in name only	Executive sponsorship, embedded data engineers
Every dataset is a “data product”	Catalog bloated, nothing discoverable	Curate: only well-documented, quality-checked data
Central team still builds pipelines	Bottleneck unchanged	True self-service, not “submit a request”

Data mesh is an organizational pattern, not a technology. It works when domains genuinely own their data products, a self-serve platform reduces the engineering burden, and federated governance ensures interoperability. Without all three, it is just a rebranded data lake.

Core Principles

Data Product Interface

Anti-Patterns

More in Data Science

A/B Testing at Scale

A/B Testing Statistical Framework

A/B Testing Infrastructure: Making Data-Driven Decisions Without Breaking Production