The first feature flag I implemented was a boolean in a config file. Toggle it to true, deploy, feature is on. Toggle to false, deploy, feature is off. Simple.
The fifteenth feature flag was a percentage rollout with user cohort targeting, metric-based automatic rollback, and a kill-switch that worked without a deployment. The gap between those two implementations is what this post is about.
Why Feature Flags Exist
Feature flags solve a specific problem: deploying code and releasing features are different events, and you want to decouple them.
Without flags, deploying code = releasing the feature. You can’t deploy your code until it’s ready to release. This creates pressure to batch releases, which increases deployment size, which increases risk.
With flags:
- Deploy code frequently (small, low-risk deployments)
- Release features independently (on your schedule, with rollback capability)
- Roll out gradually (percentage of users, beta cohort, internal only)
- A/B test (different variants to different user groups)
- Kill-switch without redeployment (disable a feature in seconds if it misbehaves)
These properties become more valuable as the system grows and the cost of production incidents increases.
The Flag Types
Not all feature flags are the same. Treating them as one thing leads to cluttered flag management and slow evaluation:
Type Lifespan Evaluation Example
─────────────────────────────────────────────────────────────────────
Release flag Weeks Per-request "new-order-confirmation-ui"
Experiment flag Weeks Per-user "checkout-flow-variant"
Ops flag Months Per-request "enable-rate-limiting"
Permission flag Months–∞ Per-user "beta-api-access"
Kill-switch Indefinite Per-request "disable-payment-processor"
Release flags gate new features until they’re fully rolled out, then get deleted. They should live days to weeks, not months.
Experiment flags assign users to variants for A/B tests. They need stable assignment (same user always gets the same variant) and metrics integration.
Ops flags control operational behaviour — rate limits, circuit breakers, retry policies. They’re changed rarely but need to propagate quickly.
Kill-switches disable specific features immediately when problems are detected. They must evaluate to false (off) if the flag service is unavailable.
The Anatomy of a Flag Evaluation
| |
The defaultValue parameter is critical — it’s what the client returns if the flag service is unavailable or the flag doesn’t exist. Kill-switches should have defaultValue = false (disabled). Features under development should have defaultValue = false (not yet active). High-availability features should have defaultValue = true (assume enabled if flag service is down).
Consistent User Assignment
Percentage rollouts need to assign users consistently — the same user should always be in the same bucket, across calls, across service instances, across time (until the rollout configuration changes).
The standard approach: hash the flag key and user ID, map to a bucket:
| |
Including the flag key in the hash ensures that the same user isn’t always in the same rollout bucket across all flags. Without this, the first 10% of users get every feature and the last 10% never get early access to anything.
Flag Evaluation Performance
Flags are evaluated in hot paths — often on every request. The evaluation must be fast:
Implementation Evaluation latency Notes
────────────────────────────────────────────────────────────────
In-process (memory) ~100ns Best; requires sync from source
HTTP to flag service 1–10ms Too slow for per-request
SDK with local cache ~1µs Good; cache invalidation needed
The standard pattern: the flag service SDK maintains an in-process cache of all flag configurations, updated by a background goroutine polling or receiving SSE/streaming updates from the flag service. Evaluations hit the local cache; the service is only contacted for cache updates.
| |
The Technical Debt Trap
Feature flags accumulate. Each flag added is code that must be maintained, evaluated, and eventually removed. A codebase with 200 active flags is:
- Harder to reason about (which flags are active? what’s the default behaviour?)
- Harder to test (combinatorial explosion of flag states)
- Slower to evaluate (more rules to process)
- Harder to onboard into (new engineers don’t know flag history)
The discipline required:
1. Every flag has an owner and an expiry date in the flag registry
2. Release flags must be deleted within N days of full rollout
3. Monthly flag review: what's still active? what's been retired?
4. Flags older than 90 days without a defined expiry are escalated
The removal is the hard part. When a flag has been enabled at 100% for 6 months, the surrounding if statement feels safe. Removing it requires testing, and the test cases feel artificial. But the alternative — accumulating permanent if featureEnabled("xyz") branches — produces a codebase where it’s impossible to understand the intended execution path.
Kill-Switch Implementation
Kill-switches have stricter requirements than other flags:
- Must evaluate to
false(safe default) if the flag service is unavailable - Must propagate within seconds of being toggled (not minutes)
- Must not add latency to the happy path when enabled
The streaming/SSE pattern handles requirements 1 and 2: the SDK maintains a persistent connection to the flag service and receives push updates. When the connection drops, the cached value (last known state) is used. Kill-switches are defined with defaultValue = false so that a cold-start service with no cached state behaves safely.
For requirement 3: flag evaluation must be on the critical path but not slow. The in-process cache approach achieves this — the check is a hash map lookup with zero I/O.
What OpenFeature Standardises
OpenFeature (CNCF project) is doing for feature flags what OpenTelemetry did for observability: a vendor-neutral API with pluggable providers.
| |
Swapping from LaunchDarkly to a self-hosted Flagsmith is a provider swap, not an instrumentation rewrite. This is the same value proposition as OTel’s exporter abstraction.
Feature flags done well are invisible infrastructure — you toggle them, things change, you move on. Feature flags done poorly are a maze of conditions that nobody understands and everyone is afraid to delete. The investment in naming conventions, expiry tracking, and regular cleanup is smaller than it looks and pays back continuously.