Verified by Garnet Grid

Data Contracts for Pipeline Reliability

Implement data contracts between producers and consumers. Covers schema registries, contract testing, versioning strategies, breaking change management, and organizational adoption.

Data contracts are explicit agreements between data producers and data consumers about what data will look like, when it will arrive, and what quality guarantees it carries. Without contracts, any upstream change — a renamed column, a new enum value, a format change — silently breaks every downstream pipeline, dashboard, and ML model.

This guide covers how to implement data contracts technically and organizationally, turning implicit assumptions into enforceable agreements.


What a Data Contract Contains

# data-contract.yaml
contract:
  name: "orders"
  version: "2.1.0"
  owner: "commerce-team"
  
  schema:
    type: "object"
    properties:
      order_id:
        type: string
        format: uuid
        description: "Unique order identifier"
        pii: false
      customer_id:
        type: string
        description: "Customer who placed the order"
        pii: true
      amount:
        type: number
        minimum: 0.01
        maximum: 1000000
        description: "Order total in USD"
      status:
        type: string
        enum: ["pending", "processing", "shipped", "delivered", "cancelled"]
      created_at:
        type: string
        format: "date-time"
        description: "ISO 8601 timestamp"
  
  quality:
    freshness:
      max_delay: "1 hour"
    completeness:
      order_id: 100%
      customer_id: 100%
      amount: 100%
    volume:
      min_daily_records: 5000
      max_daily_records: 50000
  
  sla:
    availability: "99.9%"
    support_channel: "#commerce-data"
    
  consumers:
    - team: "analytics"
      use_case: "Revenue dashboards"
    - team: "ml-platform"
      use_case: "Churn prediction features"

Schema Registry

# Avro schema with compatibility enforcement
ORDERS_SCHEMA_V2 = {
    "type": "record",
    "name": "Order",
    "namespace": "com.company.commerce",
    "fields": [
        {"name": "order_id", "type": "string"},
        {"name": "customer_id", "type": "string"},
        {"name": "amount", "type": "double"},
        {"name": "status", "type": {"type": "enum", "name": "Status",
            "symbols": ["PENDING", "PROCESSING", "SHIPPED", "DELIVERED", "CANCELLED"]}},
        {"name": "created_at", "type": {"type": "long", "logicalType": "timestamp-millis"}},
        # New field in v2 — backward compatible (has default)
        {"name": "currency", "type": "string", "default": "USD"},
    ]
}

Compatibility Modes

ModeNew Schema CanUse Case
BACKWARDRead old dataConsumers upgrade before producers
FORWARDBe read by old consumersProducers upgrade before consumers
FULLBoth backward and forwardIndependent upgrades
NONEBreak anythingDevelopment only, never production

Contract Testing in CI/CD

# GitHub Actions: validate contracts on every PR
contract-validation:
  runs-on: ubuntu-latest
  steps:
    - name: Schema Compatibility Check
      run: |
        # Check new schema is backward compatible with registered version
        curl -X POST "https://schema-registry:8081/compatibility/subjects/orders-value/versions/latest" \
          -H "Content-Type: application/vnd.schemaregistry.v1+json" \
          -d @new-schema.json
    
    - name: Contract Test (Producer)
      run: |
        # Verify producer output matches contract
        python -m pytest tests/contract/ \
          --contract data-contracts/orders.yaml \
          --sample-data tests/fixtures/orders_sample.json
    
    - name: Notify Consumers
      if: steps.schema-check.outputs.breaking == 'true'
      run: |
        # Alert consuming teams about breaking change
        slack-notify --channel "#data-contracts" \
          --message "⚠️ Breaking change proposed for 'orders' contract"

Versioning Strategy

Change TypeVersion BumpExampleBreaking?
Add optional fieldMinor (2.0 → 2.1)Add currency with defaultNo
Add required fieldMajor (2.1 → 3.0)New warehouse_id requiredYes
Remove fieldMajorDrop legacy_statusYes
Change field typeMajoramount: stringamount: numberYes
Add enum valueMinorAdd “REFUNDED” to statusNo (if consumer handles unknown)
Rename fieldMajorcreated_atorder_timeYes

Anti-Patterns

Anti-PatternProblemFix
No contracts (implicit assumptions)Upstream changes break downstream silentlyExplicit contracts with schema + quality SLAs
Contracts without enforcementContracts exist but nobody checks themCI/CD validation, runtime schema validation
Producer-only contractsConsumers not aware of changesConsumer registration, change notification
No versioningCan’t evolve schemas without breakingSemantic versioning with compatibility checks
Contracts as documentationWritten but never testedContract tests run in CI, production validation

Checklist

  • Data contracts defined for all critical datasets
  • Schema registry deployed (Confluent, AWS Glue, or custom)
  • Compatibility mode set: BACKWARD or FULL
  • Contract testing in CI/CD: schema validation on every PR
  • Consumer registry: know who depends on each dataset
  • Breaking change process: notification + migration period
  • Runtime validation: schemas enforced at write time
  • Quality SLAs: freshness, completeness, volume in contract
  • Ownership: every contract has a team owner and support channel

:::note[Source] This guide is derived from operational intelligence at Garnet Grid Consulting. For data contract consulting, visit garnetgrid.com. :::

Jakub Dimitri Rezayev
Jakub Dimitri Rezayev
Founder & Chief Architect • Garnet Grid Consulting

Jakub holds an M.S. in Customer Intelligence & Analytics and a B.S. in Finance & Computer Science from Pace University. With deep expertise spanning D365 F&O, Azure, Power BI, and AI/ML systems, he architects enterprise solutions that bridge legacy systems and modern technology — and has led multi-million dollar ERP implementations for Fortune 500 supply chains.

View Full Profile →