Last Updated Oct 30 2019

CRO glossary: statistical significance

What is statistical significance?

Statistical significance measures the probability that a difference in conversion rates between Version A and Version B of a split test or A/B test is not caused by random chance.

In other words: if your A/B test results show a winner at 95% statistical significance, there is a 95% chance that the result is correct and you indeed have a winner—and a 5% chance that, if you repeated the experiment, you’d get a different winner or an inconclusive result.

Why is statistical significance important?

If you flip a coin 10 times in a row, there’s a 1 in 1,024 chance it will land on heads every time—due to nothing more than random chance. Those odds may seem low, but if 5,000 people read these words and try that experiment, there’s a high probability at least one of them will get 10 ‘heads’ in a row (the odds are 99.24%). Run any experiment enough times and unlikely events (statistical anomalies) are pretty much guaranteed.

In much the same way, randomness in A/B testing can produce results that don’t reflect reality. Statistical significance helps you determine the level of risk you’re willing to accept, and you can balance the desire for accuracy with the resources you have available. 

For instance, higher statistical significance requires a larger sample size (all things being equal), so if you’re willing to accept a greater risk that your results were caused by random chance, you can get away with running tests with a smaller sample size. This is often necessary when website traffic is low and it takes longer to build up a large sample size.

What does 'effect size' mean?

Effect size (also known as upflit in case of an increase or downlift in case of a decrease) is the percent increase or decrease in conversions between Version A and Version B of an A/B test. These are the steps to calculate it:

  • Calculate the increase/decrease: new number-original number
  • Divide the result by the original number 
  • Multiply the answer by 100

Example #1: Version A converts at 10% and Version B converts at 8%. The effect size is 20% (since Version B shows a 20% decrease in conversions). 

As per the above steps:

  • 10 - 8 = a decrease of 2 
  • 2/10 = 0.2
  • 0.2 * 100 = 20% decrease

Example #2: Version A converts at 10% and Version B converts at 12%. The effect size is, once again, 20% (since Version B shows a 20% increase in conversion).

Once again:

  • 10 - 12 = an increase of 2
  • 2/10 = 0.2
  • 0.2 * 100 = 20% increase

PS: find a handy calculator at https://www.skillsyouneed.com/num/percent-change.html

When all other variables remain constant, a higher effect size produces a higher confidence level. The reason for this is simple—a major difference in performance is less likely to be caused by chance, whereas a small difference could easily be the result of randomness.

Measuring statistical significance

Statisticians use a complex formula to calculate statistical significance, but you don’t have to worry about any of that. A sample size calculator will allow you to calculate the sample size you need when you enter the following information: 

  • Baseline conversion rate (current conversion rate of your  control—Version A)
  • Minimum effect size you want to detect
  • Desired statistical significance (in CRO and UX, the accepted standard is 95%)

Play around with the numbers in the calculator above and the relationship between sample size, effect size, and statistical significance will become clear.

What if you don’t get tons of traffic?

The more elements you A/B or MVT test, the more traffic you’ll need in order to draw statistically significant conclusions you can reasonably trust. If your website doesn’t generate the level of traffic you need to get that kind of sample size, you need to be more selective about what you test.

6 ways to test what matters most

In order to test the most important things, you need to figure out what matters most to your target market. Here are six ways to make educated guesses about what matters.

  1. Look at product reviews and Customer Support feedback: see what customers are saying about your brand and products. Speak to your Sales, Customer Support, and Product Design teams to figure out what your customers really want from your website/product.
  2. Find out where people leave your website: traditional analytics tools (such as Google Analytics) can show you where visitors leave the site, and you can combine this data with Hotjar’s Conversion Funnels Tool get a strong sense of what’s going on and why people are leaving.
  3. Figure out which page elements people interact with: heatmaps show where (in aggregate) users click, scroll, and hover their mouse pointers (or tap their fingers on a mobile or tablet device). Spot trends in the way people interact with key pages to decide which elements to keep, because they work, and which ones are being ignored and need changing/testing.
  4. Collect feedback from customers: on-page surveys, polls, and feedback widgets can give you open-ended voice-of-the-customer feedback that helps you understand, in your customers’ voice, what needs fixing and what you need to do more of.
  5. Review session recordings: see how individual (anonymized) users work their way through your site, where they stumble, and where they keep going back and forth when they can’t find what they’re looking for—particularly right before they decide to leave your site.
  6. Look into usability testing: usability testing tools offer insight into how people use a website. Gather direct, spoken feedback about any issues they encounter, and find out what would improve their user experience.

 

Pro tip: it’s tempting to try to improve everyone’s experience, but you’ll get more for your money if you focus on making things great for your ideal customers first.

Find the perfect elements to A/B test

Use Hotjar to pinpoint the right elements to test—those that matter most to your target market.

Free forever. Get started!