Verified by Garnet Grid

AI Governance & Model Risk Management

Build responsible AI frameworks for enterprise deployment. Covers model risk assessment, bias detection, explainability requirements, compliance mapping, and governance committee structures.

Deploying AI without governance is like deploying code without testing — it works until it doesn’t, and the failure is public, expensive, and sometimes illegal. AI governance isn’t bureaucracy for bureaucracy’s sake. It’s risk management for systems that make decisions affecting real people.

The cost of getting this wrong is severe. Amazon scrapped an AI hiring tool that discriminated against women. Apple’s credit card algorithm gave different limits to spouses with identical finances. These weren’t malicious decisions — they were ungoverned systems producing ungoverned outcomes.


The AI Risk Taxonomy

Risk CategoryExamplesImpactLikelihood
Bias & FairnessDiscriminatory lending, biased hiringLegal liability, reputational damageHigh
Accuracy & ReliabilityIncorrect medical diagnoses, false fraud alertsCustomer harm, financial lossMedium
SecurityModel extraction, adversarial attacksData breach, system compromiseMedium
PrivacyTraining on PII, memorization of individualsGDPR/CCPA violations, lawsuitsHigh
TransparencyBlack-box decisions affecting rightsRegulatory non-complianceHigh
DriftModel degradation over timeSilent quality erosionVery High

Model Risk Assessment Framework

Risk Classification Matrix

                    Low Impact          High Impact
                ┌─────────────────┬─────────────────┐
  High          │   MEDIUM RISK   │   CRITICAL RISK │
  Complexity    │   Enhanced       │   Full governance│
                │   monitoring     │   Board review   │
                ├─────────────────┼─────────────────┤
  Low           │   LOW RISK      │   MEDIUM RISK   │
  Complexity    │   Standard       │   Enhanced       │
                │   controls       │   monitoring     │
                └─────────────────┴─────────────────┘

Governance Requirements by Tier

RequirementLowMediumHighCritical
Model card
Bias testingOptional
ExplainabilityDocumentationOn-request SHAPReal-time SHAPCounterfactual
Review frequencyAnnualSemi-annualQuarterlyMonthly
Human-in-the-loopNoRecommendedRequiredRequired
Incident response planBasicDocumentedTestedRehearsed
Audit trailBasic loggingDecision loggingFull reproducibilityFull + external audit

Risk Assessment Template

model_risk_assessment:
  model_name: "Customer Churn Predictor"
  version: "3.2"
  owner: "Data Science Team"
  
  classification:
    risk_tier: "Medium"  # Low / Medium / High / Critical
    impact_area: "Customer retention decisions"
    affected_population: "All active customers (~500K)"
    decision_type: "Recommendation (human-in-the-loop)"  # vs Automated
    
  data_assessment:
    training_data_source: "Customer CRM + transaction history"
    contains_pii: true
    protected_attributes_present: ["age", "zip_code"]  # Proxies for race/income
    data_freshness: "Weekly refresh"
    known_biases: "Underrepresentation of customers < 6 months"
    
  fairness_requirements:
    protected_groups: ["age_bracket", "geographic_region"]
    fairness_metric: "Equalized odds"
    acceptable_disparity: 0.05  # 5% max difference in FPR/TPR across groups
    
  monitoring:
    frequency: "Weekly"
    drift_threshold: 0.15
    accuracy_floor: 0.82
    alert_channels: ["slack:#ml-alerts", "pagerduty"]
    
  review_schedule:
    last_review: "2026-01-15"
    next_review: "2026-04-15"
    reviewer: "Model Risk Committee"

Bias Detection & Mitigation

Pre-Training Bias Detection

import pandas as pd
from aif360.datasets import BinaryLabelDataset
from aif360.metrics import BinaryLabelDatasetMetric

# Check for representation bias in training data
dataset = BinaryLabelDataset(
    df=training_df,
    label_names=['churned'],
    protected_attribute_names=['age_group']
)

metric = BinaryLabelDatasetMetric(
    dataset,
    privileged_groups=[{'age_group': 1}],    # 25-55
    unprivileged_groups=[{'age_group': 0}]   # 18-24, 55+
)

print(f"Disparate Impact: {metric.disparate_impact():.3f}")
# Target: 0.8 - 1.25 (80% rule)
print(f"Statistical Parity Diff: {metric.statistical_parity_difference():.3f}")
# Target: -0.1 to 0.1

Mitigation Strategies

StageTechniqueHow It Works
Pre-processingResamplingOver/under-sample to balance representation
Pre-processingReweightingAssign higher weight to underrepresented groups
In-processingAdversarial debiasingTrain model to be accurate but not predictive of protected attributes
In-processingFairness constraintsAdd fairness penalty to loss function
Post-processingThreshold adjustmentDifferent decision thresholds per group to equalize rates
Post-processingReject optionDefer borderline cases to human review

Post-Training Fairness Evaluation

from sklearn.metrics import confusion_matrix

def evaluate_group_fairness(y_true, y_pred, group_labels):
    """Evaluate model fairness across demographic groups."""
    results = {}
    
    for group in set(group_labels):
        mask = group_labels == group
        cm = confusion_matrix(y_true[mask], y_pred[mask])
        tn, fp, fn, tp = cm.ravel()
        
        results[group] = {
            'accuracy': (tp + tn) / (tp + tn + fp + fn),
            'tpr': tp / (tp + fn) if (tp + fn) > 0 else 0,
            'fpr': fp / (fp + tn) if (fp + tn) > 0 else 0,
            'precision': tp / (tp + fp) if (tp + fp) > 0 else 0,
            'sample_size': int(mask.sum())
        }
    
    # Check for disparities
    tpr_values = [r['tpr'] for r in results.values()]
    fpr_values = [r['fpr'] for r in results.values()]
    
    results['_summary'] = {
        'tpr_disparity': max(tpr_values) - min(tpr_values),
        'fpr_disparity': max(fpr_values) - min(fpr_values),
        'equalized_odds_met': (
            max(tpr_values) - min(tpr_values) < 0.05 and
            max(fpr_values) - min(fpr_values) < 0.05
        )
    }
    
    return results

Model Explainability

SHAP (SHapley Additive exPlanations)

import shap

# Global feature importance
explainer = shap.TreeExplainer(model)
shap_values = explainer.shap_values(X_test)

# Summary plot: which features matter most?
shap.summary_plot(shap_values, X_test, feature_names=feature_names)

# Individual prediction explanation
# "Why was this customer flagged as high-churn-risk?"
idx = 42  # Specific prediction to explain
shap.force_plot(
    explainer.expected_value[1],
    shap_values[1][idx],
    X_test.iloc[idx],
    feature_names=feature_names
)

Explainability Requirements by Risk Tier

Risk TierRequirementImplementation
LowModel card documenting architectureStatic documentation
MediumFeature importance + individual explanationsSHAP/LIME on request
HighReal-time explanations for each predictionSHAP values served with predictions
CriticalFull audit trail + counterfactual explanationsExplainability API + logging

Regulatory Compliance Mapping

RegulationScopeKey AI Requirements
EU AI ActEU marketRisk classification, transparency, human oversight
GDPR Art. 22EU data subjectsRight to explanation for automated decisions
CCPACalifornia consumersDisclosure of automated decision-making
ECOA / Fair LendingUS financial servicesNon-discrimination in credit decisions
EEOC GuidelinesUS employmentAI hiring tools must not discriminate
NYC Local Law 144NYC employersBias audits for AI hiring tools
NIST AI RMFUS federalRisk management framework for AI systems

Governance Committee Structure

┌─────────────────────────────────┐
│     AI Ethics Board             │
│  (Quarterly strategic review)   │
├─────────────────────────────────┤
│  - CTO / Chief AI Officer       │
│  - Legal / Compliance           │
│  - Business Unit Leaders        │
│  - External Ethics Advisor      │
└────────────┬────────────────────┘

┌────────────▼────────────────────┐
│    Model Risk Committee         │
│  (Monthly operational review)   │
├─────────────────────────────────┤
│  - Head of Data Science         │
│  - ML Engineering Lead          │
│  - Risk / Compliance Officer    │
│  - Domain SME (rotating)        │
└────────────┬────────────────────┘

┌────────────▼────────────────────┐
│    Model Review Board           │
│  (Per-model deployment review)  │
├─────────────────────────────────┤
│  - Model owner (Data Scientist) │
│  - ML Engineer (serving)        │
│  - Peer reviewer                │
│  - QA / Testing representative  │
└─────────────────────────────────┘

Model Card Template

Every production model should have a model card:

# Model Card: Customer Churn Predictor v3.2

## Overview
- **Task:** Binary classification (churn / no-churn)
- **Architecture:** Random Forest (200 trees, max_depth=15)
- **Training data:** 500K customers, Jan 2024 - Dec 2025
- **Features:** 12 behavioral features (no PII)

## Performance
| Metric | Overall | Age 18-24 | Age 25-54 | Age 55+ |
|--------|---------|-----------|-----------|---------| 
| Accuracy | 87.2% | 84.1% | 88.5% | 85.3% |
| AUC-ROC | 0.923 | 0.891 | 0.934 | 0.908 |
| TPR | 0.812 | 0.789 | 0.825 | 0.801 |

## Limitations
- Underperforms on customers with < 3 months history
- Not validated for enterprise (B2B) customers
- Seasonal patterns may reduce accuracy in December

## Ethical Considerations
- No protected attributes used as direct features
- Zip code removed to prevent socioeconomic proxy discrimination
- Human review required before any retention offer > $500

Implementation Checklist

  • Model risk taxonomy defined (Low/Medium/High/Critical)
  • Governance requirements mapped per tier (bias testing, explainability, review cadence)
  • Risk assessment template completed for all production models
  • Bias detection pipeline implemented (pre-training + post-training)
  • Mitigation strategy selected per model (resampling, reweighting, threshold adjustment)
  • Fairness metrics defined per model (equalized odds, demographic parity)
  • Explainability requirements mapped to risk tiers
  • Model cards published for all production models
  • Governance committee formed with clear escalation paths
  • Regulatory compliance mapping completed (EU AI Act, GDPR, CCPA, ECOA)
  • Monitoring includes fairness metrics (not just accuracy)
  • Incident response plan for AI failures documented
  • Regular audit schedule established (quarterly for High/Critical)

:::note[Source] This guide is derived from operational intelligence at Garnet Grid Consulting. For AI governance consulting, visit garnetgrid.com. :::

Jakub Dimitri Rezayev
Jakub Dimitri Rezayev
Founder & Chief Architect • Garnet Grid Consulting

Jakub holds an M.S. in Customer Intelligence & Analytics and a B.S. in Finance & Computer Science from Pace University. With deep expertise spanning D365 F&O, Azure, Power BI, and AI/ML systems, he architects enterprise solutions that bridge legacy systems and modern technology — and has led multi-million dollar ERP implementations for Fortune 500 supply chains.

View Full Profile →