Question 1

What does "statistically significant" mean in an A/B test?

Accepted Answer

Statistical significance means the observed difference in conversion rates between A and B is unlikely to have occurred by random chance alone. At p < 0.05, there is less than a 5% probability that the difference is due to sampling variation. This does not guarantee the variant will perform the same in production — it means your sample data provides sufficient evidence to reject the hypothesis that both variants perform identically.

Question 2

What is the p-value and how should I interpret it?

Accepted Answer

The p-value is the probability of observing a difference as large as (or larger than) the one measured, assuming there is truly no difference between A and B. A p-value of 0.03 means there is a 3% chance the result is due to random chance. Lower p-values indicate stronger evidence against the null hypothesis. The conventional threshold for declaring significance is p < 0.05.

Question 3

What is the z-score in an A/B test?

Accepted Answer

The z-score measures how many standard deviations the observed difference in conversion rates is from zero (no difference). A z-score above 1.96 or below -1.96 corresponds to p < 0.05 (two-tailed). Larger absolute z-scores indicate stronger evidence of a real difference between variants.

Question 4

What sample size do I need for a reliable A/B test?

Accepted Answer

As a rule of thumb, you need at least 100 conversions per variant before the statistical test is reliable. For low-conversion-rate pages (under 2%), you may need thousands of visitors per variant. The minimum detectable effect (MDE) — the smallest lift you care about — also determines sample size: smaller desired lifts require larger samples. Use a sample size calculator before starting a test.

Question 5

What is conversion rate lift and how is it calculated?

Accepted Answer

Lift is the relative percentage improvement of variant B over control A. It is calculated as (rate_B - rate_A) / rate_A × 100. A lift of +15% means variant B converts 15% more visitors than control A. Note that lift can be positive even when the result is not statistically significant — always check the p-value before acting on lift numbers.

Question 6

What are 95% confidence intervals in this context?

Accepted Answer

The 95% confidence interval for each conversion rate is a range that would contain the true conversion rate 95% of the time if the experiment were repeated many times. If the confidence intervals for A and B do not overlap, this is a strong visual indicator of statistical significance. They are calculated using the standard 1.96 × standard error formula for each variant independently.

Question 7

When should I stop an A/B test?

Accepted Answer

Stop the test when: (1) you have reached your pre-determined minimum sample size, AND (2) the p-value is below 0.05 (or whatever significance threshold you set before the test). Peeking at results repeatedly and stopping early when you see p < 0.05 inflates the false positive rate — this is called "p-hacking." Decide the stopping rule before the test begins.

Question 8

What if my result is "not statistically significant"?

Accepted Answer

An insignificant result does not prove that A and B perform the same — it means you have insufficient evidence to declare a winner. You may need to collect more data, or the true effect size may be too small to detect with your current traffic levels. Consider running the test longer or revisiting the hypothesis and targeting a change with a potentially larger impact.

A/B Test Calculator

Control (A)

Variant (B)

About A/B Test Calculator

Key Features

Frequently Asked Questions