Deployment Strategies: Blue-Green, Canary & Rolling
Choose the right deployment strategy. Covers blue-green deployments, canary releases, rolling updates, A/B testing, recreate deployments, and matching strategy to risk tolerance.
The deployment strategy determines how much risk you take every time you ship code. A recreate deployment takes everything down and brings up the new version — great for dev, catastrophic for production. A canary deployment routes 1% of traffic to the new version while monitoring — safe, but slower. The right choice depends on your traffic volume, risk tolerance, and rollback requirements.
Strategy Comparison
| Strategy | Downtime | Risk | Rollback Speed | Cost | Complexity |
|---|---|---|---|---|---|
| Recreate | Yes (seconds-minutes) | High | Slow (redeploy) | Lowest | Lowest |
| Rolling update | Zero | Medium | Medium (roll back) | Low | Low |
| Blue-Green | Zero | Low | Instant (switch) | 2x infrastructure | Medium |
| Canary | Zero | Lowest | Instant (route back) | Low overhead | Medium-High |
| A/B Testing | Zero | Lowest | Instant | Low overhead | High |
Blue-Green Deployment
BEFORE:
Load Balancer ────────▶ Blue (v1.0) ← ALL traffic
Green (v1.1) ← NO traffic (ready)
SWITCH:
Load Balancer ────────▶ Green (v1.1) ← ALL traffic
Blue (v1.0) ← NO traffic (standby)
ROLLBACK (if needed):
Load Balancer ────────▶ Blue (v1.0) ← ALL traffic (instant!)
Green (v1.1) ← NO traffic
Canary Deployment
Step 1: [v1.0 ████████████████████] 100%
[v1.1 ] 0%
Step 2: [v1.0 ██████████████████ ] 95%
[v1.1 █] 5% ← Monitor metrics
Step 3: [v1.0 ██████████████ ] 75%
[v1.1 █████] 25% ← Still healthy
Step 4: [v1.0 ██████████ ] 50%
[v1.1 ██████████] 50% ← Metrics stable
Step 5: [v1.0 ] 0%
[v1.1 ████████████████████] 100% ← Full rollout
Kubernetes Canary with Argo Rollouts
apiVersion: argoproj.io/v1alpha1
kind: Rollout
spec:
strategy:
canary:
steps:
- setWeight: 5
- pause: {duration: 10m}
- setWeight: 25
- pause: {duration: 10m}
- setWeight: 50
- pause: {duration: 10m}
- setWeight: 100
analysis:
templates:
- templateName: success-rate
startingStep: 1
args:
- name: service-name
value: order-service
Decision Framework
How critical is zero-downtime?
├── Not critical (internal tools) → Recreate
└── Critical → How fast do you need rollback?
├── Instant → Blue-Green (if budget allows 2x infra)
└── Fast (< 5 min) → How granular is your risk management?
├── Per-user targeting → A/B Testing
├── Percentage-based → Canary
└── Instance-based → Rolling Update
Anti-Patterns
| Anti-Pattern | Problem | Fix |
|---|---|---|
| Recreate in production | Downtime on every deploy | Rolling, blue-green, or canary |
| Canary without metrics | Deploying to 5% but not checking if it’s healthy | Automated analysis gates |
| Blue-green without testing green | Switch to untested environment | Smoke tests on green before switching |
| No rollback plan | If deploy fails, manually fix forward | Pre-defined rollback trigger and process |
| Manual deployment scripts | Human error, inconsistent | CI/CD pipeline with automated strategy |
Checklist
- Deployment strategy selected based on risk tolerance
- Zero-downtime deployment for all production services
- Automated rollback: trigger conditions defined
- Health checks: readiness probe gates deployment
- Metrics monitoring during rollout (error rate, latency)
- Canary analysis: automated pass/fail gates
- Database migrations: backward compatible (no breaking changes)
- Deployment frequency: minimum weekly, target daily
:::note[Source] This guide is derived from operational intelligence at Garnet Grid Consulting. For deployment strategy consulting, visit garnetgrid.com. :::