Graceful Degradation Patterns
Build systems that degrade gracefully under failure instead of crashing completely. Covers fallback strategies, feature flagging for degradation, circuit breakers, load shedding, and the patterns that keep critical functionality working when non-critical services fail.
When a microservice fails, the entire request should not fail. Graceful degradation means that when a dependency is unavailable, the system falls back to a reduced but functional state instead of returning a 500 error. The user gets a degraded experience — but they get an experience.
Degradation Hierarchy
Level 0 — Full Functionality:
All services healthy, all features available
Recommendation engine: Personalized recommendations
Search: Full-text search with filters
Pricing: Dynamic pricing with discounts
Level 1 — Reduced Functionality:
Non-critical service down
Recommendation engine DOWN → Show popular items instead
Search filter service DOWN → Basic search only
Pricing discount service DOWN → Show base price
Level 2 — Core Only:
Multiple services degraded
Recommendations: Disabled (hide section)
Search: Keyword match only (no ML ranking)
Pricing: Cached prices (may be stale)
But: Browse catalog + add to cart + checkout = WORKS
Level 3 — Static Fallback:
Major outage
Serve cached/static version of key pages
Show "We're experiencing issues" banner
Keep checkout alive if at all possible
Accept orders for manual processing if needed
Implementation
class GracefulDegradation:
"""Fallback strategies for service failures."""
def get_recommendations(self, user_id: str):
"""Try personalized, fall back to popular, then static."""
# Level 0: Personalized recommendations
try:
return self.recommendation_service.get_for_user(user_id)
except ServiceUnavailable:
pass
# Level 1: Popular items from cache
try:
popular = self.cache.get("popular_items")
if popular:
return {"items": popular, "degraded": True, "source": "popular"}
except CacheError:
pass
# Level 2: Static fallback
return {
"items": self.static_fallback_items,
"degraded": True,
"source": "static_fallback",
}
def get_price(self, product_id: str):
"""Try live pricing, fall back to cached price."""
# Level 0: Live pricing with dynamic discounts
try:
return self.pricing_service.get_price(product_id)
except ServiceUnavailable:
pass
# Level 1: Cached price (may be stale)
cached = self.cache.get(f"price:{product_id}")
if cached:
return {
"price": cached,
"stale": True,
"cached_at": cached.timestamp,
}
# Level 2: Last known price from database
return self.db.get_base_price(product_id)
Anti-Patterns
| Anti-Pattern | Consequence | Fix |
|---|---|---|
| All-or-nothing responses | Minor failure breaks entire page | Partial responses with fallback data |
| Silent degradation | Users don’t know they’re missing features | Show “limited results” indicator |
| No fallback data | System has nothing to fall back to | Pre-cache popular/static content |
| Every service equally critical | Non-critical failure blocks critical path | Classify services by criticality |
| No degradation testing | Fallbacks untested, break when needed | Regular failure injection testing |
Graceful degradation is the difference between “the site is down” and “the site is slower today.” Design fallback strategies for every external dependency, classify services by criticality, and test your fallbacks regularly.