Implementing Feature Flags for Safer Deployments
Context
Our deployment process was all-or-nothing. New features went live to all users immediately upon deployment, making rollbacks disruptive and limiting our ability to test in production.
Decision
Implement a feature flag system using LaunchDarkly for gradual rollouts and instant kill switches
Alternatives Considered
Build custom feature flag system
- Full control over implementation
- No external dependency
- No per-seat licensing costs
- Significant development effort
- Need to build targeting, analytics, UI
- Maintenance burden on the team
Use LaunchDarkly
- Battle-tested at scale
- Rich targeting capabilities
- Built-in analytics and experimentation
- Good SDK support
- Monthly cost (~$500/month for our scale)
- External dependency
- Data leaves our infrastructure
Use environment variables
- Simple to implement
- No external dependencies
- Requires redeployment to change
- No gradual rollout capability
- No user targeting
Reasoning
The cost of LaunchDarkly is justified by the development time saved and the risk reduction from gradual rollouts. Building a comparable system in-house would take months and require ongoing maintenance. The ability to instantly disable problematic features without redeployment is invaluable.
The Problem
Our deployment anxiety was high:
- Every deploy was a potential incident
- Rollbacks required full redeployment (5-10 minutes)
- No way to test features with subset of users
- Product couldn’t run A/B tests
Feature Flag Strategy
We established patterns for flag usage:
Release Flags: Temporary flags for new features
if (flags.isEnabled('new-checkout-flow', user)) {
return newCheckoutFlow();
}
return legacyCheckoutFlow();
Ops Flags: Permanent flags for operational control
if (flags.isEnabled('enable-cache', { service: 'api' })) {
return cachedResponse();
}
Experiment Flags: For A/B testing
const variant = flags.getVariant('pricing-test', user);
return pricingPages[variant];
Rollout Process
New features now follow this process:
- Deploy with flag disabled (0%)
- Enable for internal users (dogfooding)
- Enable for 1% of users, monitor
- Gradually increase: 5% → 25% → 50% → 100%
- Remove flag after feature is stable
Results
- Deployment frequency: 3x increase (less fear)
- Incident recovery time: 90% reduction (instant kill switch)
- A/B tests run: 12 in first quarter (previously 0)
- Developer confidence: Significantly improved
The $500/month cost has paid for itself many times over in reduced incident impact and faster iteration.
Lessons Learned
- Flag hygiene matters: We schedule flag cleanup to avoid technical debt
- Default to off: New flags should be disabled by default
- Document flag purpose: Every flag needs an owner and expiration date