“I think the new design will be better.” — that’s not an argument. “The new design increased conversion by 12% with 95% statistical significance.” — that is.
Feature Flags as Foundation¶
LaunchDarkly for feature flags — turn features on/off for specific user segments without deployment. Foundation for A/B tests: group A sees the old design, group B the new one.
Statistical Significance¶
Most common mistake: ending the test too early. “After 100 users we have +20%!” — that’s noise, not signal. We set up a minimum sample size calculator and rule: no test ends under 1000 users per variant.
Bayesian Approach¶
Instead of p-values, we use Bayesian inference — “the probability that variant B is better than A is 96%”. More intuitive for business stakeholders. Implementation in PyMC3.
Guardrail Metrics¶
Every test tracks not only the primary metric (conversion), but also guardrails — page load time, error rate, user engagement. Increasing conversion at the expense of page speed is a Pyrrhic victory.
Data, Not Feelings¶
A/B testing changes decision-making culture. Less “I think”, more “data shows”.
Need help with implementation?
Our experts can help with design, implementation, and operations. From architecture to production.
Contact us