Data Governance & Data Catalog
Implement enterprise data governance. Covers data classification, data catalog tools, access policies, data stewardship, metadata management, and compliance for data assets.
Data governance without tooling is policy that nobody follows. Data governance without policy is tooling that nobody trusts. You need both: clear policies for how data should be classified, accessed, and used, combined with automated tooling that enforces those policies at scale.
Data Classification
| Level | Data Types | Access | Examples |
|---|---|---|---|
| Public | Marketing content, product info | Anyone | Blog posts, pricing page |
| Internal | Business metrics, internal docs | All employees | Revenue dashboards, wiki |
| Confidential | Customer data, financial data | Need-to-know | Customer PII, contracts |
| Restricted | Cryptographic keys, credentials | Named individuals | API keys, passwords |
Data Catalog
A data catalog is the “Google for your data.” It answers: What data do we have? Where is it? What does it mean? Who owns it? Who can access it?
| Tool | Type | Best For |
|---|---|---|
| DataHub (LinkedIn) | Open source | Engineering-driven organizations |
| OpenMetadata | Open source | Modern data stack (dbt, Airflow) |
| Amundsen (Lyft) | Open source | Discovery-focused, Python ecosystem |
| Atlan | Commercial | Enterprise governance + discovery |
| Collibra | Commercial | Large enterprise, regulatory compliance |
| dbt docs | Built-in | Already using dbt (lightweight catalog) |
Metadata Management
# Table-level metadata
table: customers
schema: analytics
description: "All registered customer accounts. One row per customer."
owner: customer-data-team
pii: true
classification: confidential
refresh: daily (6:00 AM UTC)
columns:
- name: customer_id
type: integer
description: "Primary identifier for customer"
pii: false
- name: email
type: string
description: "Customer email address"
pii: true
masking: "hash in non-production environments"
- name: full_name
type: string
description: "Customer legal name"
pii: true
masking: "redact in analytics views"
- name: segment
type: string
description: "Customer segment: 'enterprise', 'mid-market', 'smb'"
pii: false
allowed_values: ["enterprise", "mid-market", "smb"]
Data Stewardship Model
| Role | Responsibility | Example |
|---|---|---|
| Data Owner | Accountable for data quality and access | VP of Sales owns CRM data |
| Data Steward | Day-to-day governance, quality rules | Data analyst maintains quality rules |
| Data Engineer | Pipeline reliability, schema management | Builds and monitors pipelines |
| Data Consumer | Uses data responsibly, reports issues | Business analyst building reports |
| Privacy Officer | Compliance, retention policies | Reviews PII handling, GDPR/CCPA |
Anti-Patterns
| Anti-Pattern | Problem | Fix |
|---|---|---|
| No data catalog | ”Where is the customer churn data?” → 3-day search | Catalog all data assets, searchable |
| No data owners | Nobody responsible, quality degrades | Named owner for every data domain |
| Governance = blocking | Data requests take weeks to approve | Self-service with guardrails, not gates |
| Classification in name only | Data labeled but no enforcement | Automated access controls based on classification |
| PII everywhere | Compliance risk, breach impact unlimited | PII detection, masking, access logging |
Checklist
- Data classification policy defined (4 levels minimum)
- Data catalog deployed and populated
- Every table/dataset has a documented owner
- Column-level metadata: descriptions, PII flags, masking rules
- Access controls enforced based on classification
- Data quality rules defined and automated
- PII detection and masking in non-production
- Compliance: retention policies, right-to-delete processes
- Data stewardship roles assigned per domain
:::note[Source] This guide is derived from operational intelligence at Garnet Grid Consulting. For data governance consulting, visit garnetgrid.com. :::