In other words: if your A/B test results show a winner at 95% statistical significance, there is a 95% chance that the result is correct and you indeed have a winner—and a 5% chance that, if you repeated the experiment, you’d get a different winner or an inconclusive result.
If you flip a coin 10 times in a row, there’s a 1 in 1,024 chance it will land on heads every time—due to nothing more than random chance. Those odds may seem low, but if 5,000 people read these words and try that experiment, there’s a high probability at least one of them will get 10 ‘heads’ in a row (the odds are 99.24%). Run any experiment enough times and unlikely events (statistical anomalies) are pretty much guaranteed.
In much the same way, randomness in A/B testing can produce results that don’t reflect reality. Statistical significance helps you determine the level of risk you’re willing to accept, and you can balance the desire for accuracy with the resources you have available.
For instance, higher statistical significance requires a larger sample size (all things being equal), so if you’re willing to accept a greater risk that your results were caused by random chance, you can get away with running tests with a smaller sample size. This is often necessary when website traffic is low and it takes longer to build up a large sample size.
Effect size (also known as upflit in case of an increase or downlift in case of a decrease) is the percent increase or decrease in conversions between Version A and Version B of an A/B test. These are the steps to calculate it:
Example #1: Version A converts at 10% and Version B converts at 8%. The effect size is 20% (since Version B shows a 20% decrease in conversions).
As per the above steps:
Example #2: Version A converts at 10% and Version B converts at 12%. The effect size is, once again, 20% (since Version B shows a 20% increase in conversion).
PS: find a handy calculator at https://www.skillsyouneed.com/num/percent-change.html
When all other variables remain constant, a higher effect size produces a higher confidence level. The reason for this is simple—a major difference in performance is less likely to be caused by chance, whereas a small difference could easily be the result of randomness.
Statisticians use a complex formula to calculate statistical significance, but you don’t have to worry about any of that. A sample size calculator will allow you to calculate the sample size you need when you enter the following information:
Play around with the numbers in the calculator above and the relationship between sample size, effect size, and statistical significance will become clear.
In order to test the most important things, you need to figure out what matters most to your target market. Here are six ways to make educated guesses about what matters.
Pro tip: it’s tempting to try to improve everyone’s experience, but you’ll get more for your money if you focus on making things great for your ideal customers first.