ESC
Type to search guides, tutorials, and reference documentation.
Verified by Garnet Grid

Experiment Tracking with MLflow

Production MLflow setup for experiment tracking, model versioning, and artifact management. Covers local and remote tracking servers, model registry, and CI/CD integration.

Experiment tracking is the version control of machine learning. Without it, data scientists lose track of which hyperparameters produced which results, which dataset version trained which model, and which model is actually running in production. MLflow has emerged as the standard open-source solution — it’s framework-agnostic, supports every major ML library, and scales from a single laptop to enterprise deployments.

The cost of not tracking experiments is invisible until it isn’t. You’ll discover the pain the first time someone asks “can you reproduce the model from three months ago?” and you can’t.


Core Concepts

ComponentPurposeStorage
TrackingLog parameters, metrics, artifacts per runSQLite / PostgreSQL
ProjectsPackage code for reproducible runsGit repository
ModelsStandard format for model packagingLocal / S3 / Azure Blob
Model RegistryLifecycle management (staging → production)Database + artifact store

Production Tracking Server

For team use, deploy a persistent tracking server with remote storage:

mlflow server \
  --backend-store-uri postgresql://mlflow:pass@db:5432/mlflow \
  --default-artifact-root s3://mlflow-artifacts/experiments \
  --host 0.0.0.0 \
  --port 5000

Architecture:

  • Backend store (PostgreSQL): Stores experiment metadata, parameters, metrics
  • Artifact store (S3/Azure Blob): Stores model files, datasets, plots
  • Tracking UI: Web interface for comparing experiments

Logging Experiments

import mlflow

mlflow.set_tracking_uri("http://mlflow-server:5000")
mlflow.set_experiment("churn-prediction")

with mlflow.start_run(run_name="xgboost-v3"):
    # Log hyperparameters
    mlflow.log_params({
        "n_estimators": 500,
        "max_depth": 6,
        "learning_rate": 0.1,
        "subsample": 0.8,
    })
    
    # Train model
    model = train_model(X_train, y_train, **params)
    
    # Log metrics
    mlflow.log_metrics({
        "accuracy": accuracy_score(y_test, preds),
        "f1_score": f1_score(y_test, preds),
        "auc_roc": roc_auc_score(y_test, probs),
    })
    
    # Log the model
    mlflow.xgboost.log_model(model, "model")
    
    # Log artifacts (plots, data profiles)
    mlflow.log_artifact("confusion_matrix.png")
    mlflow.log_artifact("feature_importance.csv")

Model Registry Workflow

The Model Registry provides a centralized model store with lifecycle stages:

Development → Staging → Production → Archived

Registering a Model

# Register during training
mlflow.xgboost.log_model(
    model, "model",
    registered_model_name="churn-predictor"
)

# Or register an existing run's model
result = mlflow.register_model(
    model_uri="runs:/abc123/model",
    name="churn-predictor"
)

Promoting Models

from mlflow import MlflowClient

client = MlflowClient()

# Transition to staging
client.transition_model_version_stage(
    name="churn-predictor",
    version=3,
    stage="Staging"
)

# After validation, promote to production
client.transition_model_version_stage(
    name="churn-predictor",
    version=3,
    stage="Production"
)

CI/CD Integration

Automate model validation before promotion:

# .github/workflows/model-promotion.yml
on:
  workflow_dispatch:
    inputs:
      model_name:
        required: true
      model_version:
        required: true

jobs:
  validate:
    runs-on: ubuntu-latest
    steps:
      - name: Load model from registry
        run: |
          python -c "
          import mlflow
          model = mlflow.pyfunc.load_model(
              f'models:/${{ inputs.model_name }}/${{ inputs.model_version }}'
          )
          # Run validation suite
          "

      - name: Performance regression check
        run: python scripts/check_model_performance.py

      - name: Promote to production
        if: success()
        run: python scripts/promote_model.py

Best Practices

  1. Log everything: Parameters, metrics, artifacts, environment info, git hash
  2. Use consistent naming: {project}-{model_type}-{version} for run names
  3. Tag runs: Add tags for dataset version, feature set version, team
  4. Compare before promoting: Always compare new model against current production baseline
  5. Automate artifact cleanup: Set retention policies on old experiment artifacts
  6. Use model signatures: Define input/output schemas for each registered model
Jakub Dimitri Rezayev
Jakub Dimitri Rezayev
Founder & Chief Architect • Garnet Grid Consulting

Jakub holds an M.S. in Customer Intelligence & Analytics and a B.S. in Finance & Computer Science from Pace University. With deep expertise spanning D365 F&O, Azure, Power BI, and AI/ML systems, he architects enterprise solutions that bridge legacy systems and modern technology — and has led multi-million dollar ERP implementations for Fortune 500 supply chains.

View Full Profile →