Progressive Delivery
Ship features safely with progressive delivery techniques. Covers canary analysis, feature flags, dark launches, percentage rollouts, automated rollback, and the patterns that let you deploy with confidence and recover from failures fast.
Progressive delivery is the practice of gradually exposing new code to users while monitoring for problems. Instead of deploying to 100% of users and hoping for the best, you deploy to 1%, watch the metrics, and expand only when confident. If something breaks, you roll back before most users notice.
Progressive Delivery Pipeline
Deployment Pipeline:
1. Deploy to Canary (1% of traffic)
Monitor: Error rate, latency, CPU
Duration: 15 minutes
Gate: Error rate < 0.1%, P99 latency < 500ms
2. Expand to 10%
Monitor: Business metrics (conversion, revenue)
Duration: 1 hour
Gate: No regression in conversion rate
3. Expand to 50%
Monitor: All metrics + customer support tickets
Duration: 4 hours
Gate: No anomalies
4. Full rollout (100%)
Continue monitoring for 24 hours
Automated rollback if degradation detected
At any stage: Automated rollback on metric degradation
Automated Canary Analysis
class CanaryAnalyzer:
"""Compare canary metrics against baseline to decide pass/fail."""
def analyze(self, baseline_metrics, canary_metrics, config):
results = {}
# Error rate comparison
results["error_rate"] = self.compare_metric(
baseline=baseline_metrics.error_rate,
canary=canary_metrics.error_rate,
threshold=config.max_error_rate_increase,
direction="lower_is_better",
)
# Latency comparison
results["p99_latency"] = self.compare_metric(
baseline=baseline_metrics.p99_latency,
canary=canary_metrics.p99_latency,
threshold=config.max_latency_increase_pct,
direction="lower_is_better",
)
# Success rate
results["success_rate"] = self.compare_metric(
baseline=baseline_metrics.success_rate,
canary=canary_metrics.success_rate,
threshold=config.min_success_rate,
direction="higher_is_better",
)
# Aggregate decision
all_pass = all(r.passed for r in results.values())
if all_pass:
return CanaryDecision.PASS
elif any(r.severity == "critical" for r in results.values()):
return CanaryDecision.ROLLBACK
else:
return CanaryDecision.HOLD # Need more data
def compare_metric(self, baseline, canary, threshold, direction):
"""Statistical comparison with Mann-Whitney U test."""
from scipy.stats import mannwhitneyu
stat, p_value = mannwhitneyu(baseline, canary, alternative="two-sided")
significant = p_value < 0.05
if direction == "lower_is_better":
regression = canary.mean() > baseline.mean() * (1 + threshold)
else:
regression = canary.mean() < baseline.mean() * (1 - threshold)
return MetricResult(
passed=not (significant and regression),
p_value=p_value,
baseline_mean=baseline.mean(),
canary_mean=canary.mean(),
)
Dark Launches
# Dark launch: Execute new code path in production
# but serve results from old code path
class DarkLaunchMiddleware:
def process_request(self, request):
# Old path: Always serves the response
old_result = self.old_handler(request)
# New path: Executed but result is discarded
if self.feature_flags.is_enabled("new_search_engine"):
try:
new_result = self.new_handler(request)
# Compare results for correctness
self.compare_and_log(
old_result=old_result,
new_result=new_result,
request=request,
)
# Monitor new path performance
self.metrics.record(
"dark_launch.latency",
new_result.latency,
tags={"path": "new_search"},
)
except Exception as e:
# New path fails silently
self.metrics.increment("dark_launch.errors")
self.logger.warning(f"Dark launch error: {e}")
# Always return old result
return old_result
Anti-Patterns
| Anti-Pattern | Consequence | Fix |
|---|---|---|
| Canary without automated analysis | Human judgement is slow and biased | Automated metric comparison with statistical tests |
| Too short canary window | Miss slow-building problems | Minimum 15 min per stage, longer for business metrics |
| No automated rollback | Depends on human reaction time | Automated rollback on metric degradation |
| Skip dark launch for risky changes | First real traffic reveals problems | Dark launch critical path changes |
| Same rollout speed for all changes | Low-risk changes are slowed down | Risk-based rollout profiles |
Progressive delivery is not about deploying slowly — it is about deploying safely. A team with progressive delivery ships more frequently and with more confidence than a team doing all-at-once deployments.