ZeroBlockers
Posts
Managing Test Case Explosion with Feature Flags: Strategies and Solutions

Managing Test Case Explosion with Feature Flags: Strategies and Solutions

Feature flags let development teams commit code while still working on a feature to reduce the risk of merge conflicts, and to enable them to release features incrementally.. However, they come with a significant challenge: test case explosion. As each feature flag essentially creates a duplicate version of the system, the number of potential paths requiring testing grows exponentially.

Rory Madden
May 26, 2025

When a system implements feature flags, each flag doubles the number of possible states the system can be in. One feature flag results in two different paths but with just 10 feature flags, there are theoretically 2¹⁰ (1,024) different combinations to test. Testing each configuration for correctness, performance, and security quickly becomes infeasible.

Strategies to Mitigate Test Case Explosion

While it may be impossible to test every combination of feature flags, several strategies can help manage complexity and improve software quality upfront.

Modular Architecture

The first line of defense against test case explosion is proper code organisation. By implementing a modular architecture, teams can isolate the impact of feature flags and reduce the testing surface area. This approach includes:

Limiting feature flag scope to specific components or modules
Implementing clean interfaces between flagged features and the rest of the system
Ensuring that each module has clear boundaries and responsibilities

Test-Driven Development (TDD)

Test-Driven Development proves particularly valuable when working with feature flags. By writing tests before implementation, developers naturally create more testable code and better understand the boundaries of flagged features. TDD encourages:

Writing smaller, focused units of code
Clearly defining feature boundaries
Creating tests that explicitly handle both flag states
Maintaining high test coverage from the start

Feature Flag Design Patterns

Several design patterns can help manage the complexity of feature flags:

Flag Hierarchies: Organising flags in a hierarchy where some flags depend on others, reducing the number of valid combinations
Feature Toggles Service: Centralising flag management in a dedicated service that enforces rules about valid flag combinations

Testing in Production

While comprehensive pre-production testing remains important, testing in production acknowledges that some edge cases will escape pre-release testing. Testing in production isn’t as reckless as it sounds because it should only be implemented in parallel with observability tools to monitor key performance indicators (KPIs) and error rates so that you can identify and react to any production issues introduced quickly.

Progressive Rollouts

You can use the feature flag system to only expose the new functionality to a small percentage of users, monitor their behavior, and expand the rollout as confidence grows. If issues are detected you can automatically rollback by toggling the feature off.

Blue-Green Deployments

If the change is more complex you might need different infrastructure and data changes. In a blue-green deployment, one environment (blue) runs the current production version, while another (green) runs the updated version. Traffic can be shifted incrementally to the green environment, allowing issues to be identified and resolved before full rollout. Because the blue environment still exists you can recover instantly.

Testing in Production: Hudson’s Bay Company (HBC)

Hudson's Bay Company, a 348-year-old organization known for its iconic brands like Saks Fifth Avenue and Hudson’s Bay, provides an excellent case study. HBC had a legacy system built over many years, they relied on third-party APIs only available in their staging environment and they had a lot of challenges (and costs) trying to keep the test data in their environments in sync with realistic data. These issues meant that bugs were still making it through to production, even after extensive regression testing.

HBC decided to start testing in production. Because the platform already supported multiple fashion brands it was easy to add another brand: test. They implemented a sophisticated feature flag system combined with canary releases, allowing new features to be deployed to the test brand and comprehensively tested before full rollout.

The results of these changes were significant. System stability improved markedly through better issue detection in real-world conditions. Feature deployment became faster and more reliable, reducing time-to-market while maintaining quality. Even the user experience improved as performance issues could be addressed promptly through real-time monitoring.

Conclusion

While feature flags can exponentially increase the variations of a system, teams can manage this complexity through a combination of architectural decisions, development practices, and production testing strategies. Teams need to accept that not every combination needs equal testing attention and focusing on building robust systems that can detect and recover from issues quickly is a better, long-term solution.

As demonstrated by Hudson's Bay Company's success, even organisations with complex legacy systems can effectively implement these modern testing practices. By implementing proper monitoring, gradual rollouts, and quick recovery mechanisms, organizations can confidently use feature flags while maintaining high quality standards.

Feature flags remain a powerful tool for modern software development, and by understanding and addressing their testing challenges, teams can fully leverage their benefits while minimising their overheads.