ESC
Type to search guides, tutorials, and reference documentation.
Verified by Garnet Grid

Data Governance Frameworks

Implement data governance that enables data usage while maintaining compliance. Covers data cataloging, data classification, access policies, data quality rules, lineage tracking, and the organizational structures that make governance effective.

Data governance answers three questions: What data do we have? Who is allowed to access it? Is it accurate? Without governance, organizations drown in data lakes full of undocumented, unclassified, ungoverned data that no one trusts and everyone is afraid to use.


Governance Pillars

1. Data Cataloging:    What data exists and where
2. Data Classification: Sensitivity level of each dataset
3. Access Control:      Who can access what, and how
4. Data Quality:        Is the data accurate and complete?
5. Data Lineage:        Where did this data come from, what transformed it?
6. Retention:           How long do we keep data?
7. Privacy:             PII handling, consent, right to deletion

Data Classification

classification_levels:
  public:
    description: "Data intended for public consumption"
    examples: ["Marketing content", "Public APIs", "Blog posts"]
    controls: "None required"
    
  internal:
    description: "Data for internal use, not sensitive"
    examples: ["Internal wikis", "Non-sensitive metrics", "Team directories"]
    controls: "Authentication required"
    
  confidential:
    description: "Business-sensitive data"
    examples: ["Revenue data", "Customer lists", "Strategic plans"]
    controls: "Role-based access, audit logging"
    
  restricted:
    description: "Highly sensitive, regulated data"
    examples: ["PII", "PHI", "Payment card data", "SSN"]
    controls: "Encryption, MFA, audit logging, DLP, retention limits"

auto_classification_rules:
  - pattern: "SSN|social_security"
    classification: restricted
  - pattern: "email|phone|address"
    classification: restricted
  - pattern: "revenue|profit|margin"
    classification: confidential
  - pattern: "password|secret|token"
    classification: restricted

Data Catalog

# Catalog entry for a dataset
catalog_entry = {
    "name": "customer_360",
    "schema": "analytics",
    "description": "Unified customer profile combining CRM, product, and billing data",
    "owner": "data-platform-team",
    "steward": "jane.doe@company.com",
    "classification": "restricted",  # Contains PII
    
    "columns": [
        {"name": "customer_id", "type": "UUID", "classification": "internal", "pii": False},
        {"name": "email", "type": "VARCHAR", "classification": "restricted", "pii": True},
        {"name": "name", "type": "VARCHAR", "classification": "restricted", "pii": True},
        {"name": "lifetime_value", "type": "DECIMAL", "classification": "confidential", "pii": False},
        {"name": "churn_score", "type": "FLOAT", "classification": "confidential", "pii": False},
    ],
    
    "lineage": {
        "sources": ["crm.contacts", "billing.customers", "product.user_events"],
        "transformations": ["dbt model: customer_360"],
        "consumers": ["marketing-dashboard", "churn-prediction-model"]
    },
    
    "quality": {
        "freshness": "Updated daily at 06:00 UTC",
        "completeness": "email: 98%, name: 95%, ltv: 92%",
        "uniqueness": "customer_id: 100% unique",
        "tests": ["not_null: customer_id, email", "unique: customer_id"]
    },
    
    "retention": {
        "policy": "7 years after last activity",
        "pii_deletion": "90 days after deletion request"
    }
}

Data Quality Rules

-- dbt tests for data quality
-- schema.yml
models:
  - name: customer_360
    columns:
      - name: customer_id
        tests:
          - not_null
          - unique
      - name: email
        tests:
          - not_null
          - accepted_values:
              values: ['%@%.%']  # Valid email pattern
      - name: lifetime_value
        tests:
          - not_null
          - dbt_utils.accepted_range:
              min_value: 0
              max_value: 10000000
      - name: churn_score
        tests:
          - dbt_utils.accepted_range:
              min_value: 0.0
              max_value: 1.0

Anti-Patterns

Anti-PatternConsequenceFix
No data catalog”What data do we have?” → nobody knowsAutomated catalog + manual enrichment
Classification = one-time projectNew data unclassifiedAuto-classification rules in pipeline
Governance without toolingManual, unsustainableData catalog tools (DataHub, OpenMetadata)
Governance blocks data accessTeams work around governanceEnable access with guardrails, not gates
No data quality monitoringDecisions based on bad dataAutomated quality checks in pipelines

Data governance is the immune system for your data platform. Without it, data quality degrades, privacy violations accumulate, and trust erodes until no one believes the dashboards.

Jakub Dimitri Rezayev
Jakub Dimitri Rezayev
Founder & Chief Architect • Garnet Grid Consulting

Jakub holds an M.S. in Customer Intelligence & Analytics and a B.S. in Finance & Computer Science from Pace University. With deep expertise spanning D365 F&O, Azure, Power BI, and AI/ML systems, he architects enterprise solutions that bridge legacy systems and modern technology — and has led multi-million dollar ERP implementations for Fortune 500 supply chains.

View Full Profile →