ESC
Type to search guides, tutorials, and reference documentation.
Verified by Garnet Grid

Feature Flags at Scale: Decoupling Deployment from Release

Implement feature flags that let you ship code to production without exposing it to users until you are ready. Covers flag lifecycle, targeting strategies, percentage rollouts, kill switches, technical debt management, and the operational practices that prevent flag sprawl.

The most powerful deployment pattern is also the simplest concept: deploy your code to production, but hide it behind a flag. If the feature works, turn the flag on. If it breaks, turn the flag off. No rollback, no hotfix, no 2 AM deployment. Just flip a switch.

Feature flags decouple deployment (putting code on servers) from release (making features visible to users). This distinction changes how your team thinks about risk: every deployment becomes boring because the risk is managed by the flag, not the deployment.


Flag Types

TypeLifespanPurposeExample
Release flagDays to weeksShip incomplete features safelynew_checkout_flow
Experiment flagWeeksA/B testingexperiment_pricing_page_v2
Ops flagPermanentKill switches, circuit breakersenable_recommendation_engine
Permission flagPermanentFeature entitlements, tierspremium_analytics_dashboard

Flag Lifecycle

Created   →   Developing   →   Testing   →   Rolling Out   →   Fully On   →   Removed
  │              │                │              │                │              │
  │  Code behind │   QA uses     │  1% → 10%   │  Flag removed  │  Code lives  │
  │  flag, off   │   flag to     │  → 50%      │  from code,    │  without     │
  │  by default  │   test both   │  → 100%     │  flag deleted  │  flag        │
  │              │   states      │  of users    │  from system   │              │

The most important phase is “Removed.” Every flag that stays in the code after it is fully rolled out is technical debt. If you launch 50 features a year and never remove flags, you have 50 dead flags creating branching complexity throughout your codebase.


Implementation Patterns

Basic Flag Check

# Simple boolean flag
def get_recommendations(user_id: str):
    if feature_flags.is_enabled("ml_recommendations", user_id=user_id):
        return recommendation_engine.get_ml_recommendations(user_id)
    else:
        return recommendation_engine.get_rule_based_recommendations(user_id)

Percentage Rollout

# Gradual rollout: 10% of users → 50% → 100%
flag_config = {
    "name": "new_checkout_flow",
    "rollout_percentage": 10,     # 10% of users see new flow
    "targeting": {
        "include_users": ["internal-team@company.com"],  # Always on for team
        "exclude_users": ["vip-customer-123"],            # Never for VIP during testing
    },
    "kill_switch": True,          # Can be turned off instantly
}

Targeting Rules

# Complex targeting: different behavior for different segments
flag_rules = {
    "name": "premium_analytics",
    "rules": [
        {
            "description": "Internal team always gets the feature",
            "conditions": [{"attribute": "email", "operator": "ends_with", "value": "@company.com"}],
            "serve": True
        },
        {
            "description": "Enterprise tier customers",
            "conditions": [{"attribute": "plan", "operator": "equals", "value": "enterprise"}],
            "serve": True
        },
        {
            "description": "10% of pro users for testing",
            "conditions": [{"attribute": "plan", "operator": "equals", "value": "pro"}],
            "rollout_percentage": 10,
            "serve": True
        }
    ],
    "default": False
}

Operational Practices

Flag Hygiene

PracticeFrequencyPurpose
Flag auditMonthlyIdentify stale flags (> 30 days old, 100% rolled out)
Flag ownerAlwaysEvery flag has a named owner who is responsible for removal
Expiration datesOn creationSet expected removal date when creating the flag
Flag count limitAlwaysAlert when total active flags > 50 (team) or 200 (org)
Flag status dashboard:

  Active flags: 45
  ├─ Release flags: 12 (3 past expected removal date ⚠️)
  ├─ Experiment flags: 8 (2 experiments concluded, flags remain ⚠️)
  ├─ Ops flags: 15 (permanent, reviewed quarterly)
  └─ Permission flags: 10 (permanent, tied to billing tiers)

  Flags at 100% rollout (should be removed): 5 ❌
  Flags older than 90 days: 8 (3 have valid reasons, 5 need cleanup)

Kill Switch Pattern

# Every feature that calls an external service should have a kill switch
class RecommendationService:
    def get_recommendations(self, user_id: str) -> list:
        # Kill switch: if the ML service is causing issues, turn it off
        if not feature_flags.is_enabled("enable_ml_recommendations"):
            return self.get_fallback_recommendations(user_id)

        try:
            return self.ml_client.recommend(user_id, timeout=2.0)
        except (Timeout, ServiceError):
            # Circuit breaker: auto-disable after repeated failures
            self.record_failure()
            if self.failure_count > self.threshold:
                feature_flags.disable("enable_ml_recommendations")
                self.alert_team("ML recommendations auto-disabled after failures")
            return self.get_fallback_recommendations(user_id)

Tools

ToolTypeBest For
LaunchDarklySaaSEnterprise, rich targeting, experiments
UnleashOpen source (self-hosted)Full control, no vendor lock-in
FlagsmithOpen source + SaaSFeature flags + remote config
SplitSaaSFeature flags + experimentation
Custom (database + cache)CustomSimple needs, small teams

Anti-Patterns

Anti-PatternProblemFix
Flag rotHundreds of dead flags in codeMonthly audits, expiration dates, flag count alerts
Nested flagsif flagA && flagB && !flagCLimit flag nesting to 1 level
Testing only one stateTests only run with flag onTest both states in CI
No fallbackFlag off = feature crashesEvery flag has a working fallback state
Long-lived release flags”We’ll remove it later” (never happens)Block PR merges for flags past expiration

Implementation Checklist

  • Choose a feature flag system (LaunchDarkly, Unleash, or custom)
  • Define flag types: release, experiment, ops, permission
  • Set expiration dates on all release and experiment flags at creation time
  • Assign an owner to every flag (who will remove it)
  • Implement kill switches for all external service integrations
  • Test both flag states (on and off) in CI
  • Run monthly flag audits: identify and remove stale flags
  • Alert when total flags exceed 50 per team
  • Track flags at 100% rollout — these should be removed within 2 weeks
  • Document flag removal process: remove code, remove flag, verify tests pass
Jakub Dimitri Rezayev
Jakub Dimitri Rezayev
Founder & Chief Architect • Garnet Grid Consulting

Jakub holds an M.S. in Customer Intelligence & Analytics and a B.S. in Finance & Computer Science from Pace University. With deep expertise spanning D365 F&O, Azure, Power BI, and AI/ML systems, he architects enterprise solutions that bridge legacy systems and modern technology — and has led multi-million dollar ERP implementations for Fortune 500 supply chains.

View Full Profile →