Progressive Delivery | The Garnet Wiki

Progressive delivery is the practice of gradually exposing new code to users while monitoring for problems. Instead of deploying to 100% of users and hoping for the best, you deploy to 1%, watch the metrics, and expand only when confident. If something breaks, you roll back before most users notice.

Progressive Delivery Pipeline

Deployment Pipeline:
  1. Deploy to Canary (1% of traffic)
     Monitor: Error rate, latency, CPU
     Duration: 15 minutes
     Gate: Error rate < 0.1%, P99 latency < 500ms
     
  2. Expand to 10%
     Monitor: Business metrics (conversion, revenue)
     Duration: 1 hour
     Gate: No regression in conversion rate
     
  3. Expand to 50%
     Monitor: All metrics + customer support tickets
     Duration: 4 hours
     Gate: No anomalies
     
  4. Full rollout (100%)
     Continue monitoring for 24 hours
     Automated rollback if degradation detected
     
At any stage: Automated rollback on metric degradation

Automated Canary Analysis

class CanaryAnalyzer:
    """Compare canary metrics against baseline to decide pass/fail."""
    
    def analyze(self, baseline_metrics, canary_metrics, config):
        results = {}
        
        # Error rate comparison
        results["error_rate"] = self.compare_metric(
            baseline=baseline_metrics.error_rate,
            canary=canary_metrics.error_rate,
            threshold=config.max_error_rate_increase,
            direction="lower_is_better",
        )
        
        # Latency comparison
        results["p99_latency"] = self.compare_metric(
            baseline=baseline_metrics.p99_latency,
            canary=canary_metrics.p99_latency,
            threshold=config.max_latency_increase_pct,
            direction="lower_is_better",
        )
        
        # Success rate
        results["success_rate"] = self.compare_metric(
            baseline=baseline_metrics.success_rate,
            canary=canary_metrics.success_rate,
            threshold=config.min_success_rate,
            direction="higher_is_better",
        )
        
        # Aggregate decision
        all_pass = all(r.passed for r in results.values())
        
        if all_pass:
            return CanaryDecision.PASS
        elif any(r.severity == "critical" for r in results.values()):
            return CanaryDecision.ROLLBACK
        else:
            return CanaryDecision.HOLD  # Need more data
    
    def compare_metric(self, baseline, canary, threshold, direction):
        """Statistical comparison with Mann-Whitney U test."""
        from scipy.stats import mannwhitneyu
        
        stat, p_value = mannwhitneyu(baseline, canary, alternative="two-sided")
        
        significant = p_value < 0.05
        
        if direction == "lower_is_better":
            regression = canary.mean() > baseline.mean() * (1 + threshold)
        else:
            regression = canary.mean() < baseline.mean() * (1 - threshold)
        
        return MetricResult(
            passed=not (significant and regression),
            p_value=p_value,
            baseline_mean=baseline.mean(),
            canary_mean=canary.mean(),
        )

Dark Launches

# Dark launch: Execute new code path in production
# but serve results from old code path

class DarkLaunchMiddleware:
    def process_request(self, request):
        # Old path: Always serves the response
        old_result = self.old_handler(request)
        
        # New path: Executed but result is discarded
        if self.feature_flags.is_enabled("new_search_engine"):
            try:
                new_result = self.new_handler(request)
                
                # Compare results for correctness
                self.compare_and_log(
                    old_result=old_result,
                    new_result=new_result,
                    request=request,
                )
                
                # Monitor new path performance
                self.metrics.record(
                    "dark_launch.latency",
                    new_result.latency,
                    tags={"path": "new_search"},
                )
            except Exception as e:
                # New path fails silently
                self.metrics.increment("dark_launch.errors")
                self.logger.warning(f"Dark launch error: {e}")
        
        # Always return old result
        return old_result

Anti-Patterns

Anti-Pattern	Consequence	Fix
Canary without automated analysis	Human judgement is slow and biased	Automated metric comparison with statistical tests
Too short canary window	Miss slow-building problems	Minimum 15 min per stage, longer for business metrics
No automated rollback	Depends on human reaction time	Automated rollback on metric degradation
Skip dark launch for risky changes	First real traffic reveals problems	Dark launch critical path changes
Same rollout speed for all changes	Low-risk changes are slowed down	Risk-based rollout profiles

Progressive delivery is not about deploying slowly — it is about deploying safely. A team with progressive delivery ships more frequently and with more confidence than a team doing all-at-once deployments.

Progressive Delivery Pipeline

Automated Canary Analysis

Dark Launches

Anti-Patterns

More in DevOps & CI/CD

Chaos Engineering in Practice

Canary Deployments

CI/CD Pipeline Maturity Model