Statistical Significance Calculator
Did your A/B test produce a real winner — or just random noise? Enter your results to get an instant verdict with confidence level and p-value.
Control (A)
Variation (B)
Enter visitors and conversions for both control and variation
Uses a two-proportion z-test with pooled standard error. Significance threshold: p < 0.05 (95% confidence).
How to Interpret Your Results
p-value < 0.05 and variation > control. Your variation outperforms the control with 95%+ confidence. You can ship this change — though monitor post-launch to confirm the effect holds.
p-value ≥ 0.05. There is not enough evidence to declare a winner or loser. Either continue the test until you reach your required sample size, or accept that the difference is too small to matter.
p-value < 0.05 and variation < control. Your variation is actively hurting conversions with statistical confidence. Stop the test, revert to control, and investigate why.
The Statistics Behind the Calculator
This calculator uses a two-proportion z-test with a pooled standard error. This is the most widely used statistical test for A/B testing conversion rates and is the method behind tools like Optimizely, VWO, and Google Optimize.
Pooled proportion:
p̂ = (x₁ + x₂) / (n₁ + n₂)
Standard error:
SE = √(p̂ × (1−p̂) × (1/n₁ + 1/n₂))
Z-score:
z = (p₂ − p₁) / SE
Two-tailed p-value:
p = 2 × min(Φ(z), 1 − Φ(z))
Where x₁/n₁ = control conversions/visitors, x₂/n₂ = variation conversions/visitors, Φ = standard normal CDF.
What "Statistically Significant" Doesn't Mean
Statistical significance is widely misinterpreted. Here's what a p < 0.05 result does not tell you:
- ✗ It doesn't mean there's a 95% chance the variation is better.
It means there's a 5% chance you'd see a difference this large by random chance if there were no true effect. Frequentist statistics don't give you probability of hypotheses being true.
- ✗ It doesn't mean the observed lift will hold in production.
The effect size you measure in a test is often an overestimate (regression to the mean). Always monitor results after shipping and verify the lift persists.
- ✗ It doesn't mean the difference is practically meaningful.
With very high traffic, you can achieve statistical significance on a 0.01% lift that generates no meaningful revenue. Always ask: "Is this effect size worth acting on?" before shipping.
Frequently Asked Questions
What is statistical significance in A/B testing?
Statistical significance tells you whether the observed difference between your control and variation is likely real, or likely due to random chance. At 95% significance (p < 0.05), there's only a 5% probability the difference you observed happened randomly. It's the threshold CRO teams use to decide whether to ship a variation or keep testing.
What is a p-value?
The p-value is the probability of observing a difference at least as large as the one in your test, assuming the null hypothesis (no true difference) is true. A p-value of 0.03 means there's a 3% chance of seeing your results by random chance alone. Below 0.05 is significant; below 0.01 is highly significant. Never interpret p-value as the probability that your variation wins.
My result is significant at 80% confidence. Should I ship?
Generally no. At 80% confidence, you have a 20% false positive rate — meaning 1 in 5 times you see "significant" results at this threshold, you'd be shipping a change that has no real effect or is actually harmful. Wait until you reach 95% confidence (p < 0.05) for shipping decisions, or use lower confidence only for low-stakes directional signals.
What is observed lift and how is it calculated?
Observed lift is the relative improvement of the variation over the control: lift = (variation_rate − control_rate) / control_rate × 100%. For example, control 2.0% → variation 2.4%: lift = (2.4 − 2.0) / 2.0 × 100% = +20%. This is the relative lift; the absolute lift is 0.4 percentage points. Both metrics matter when communicating impact to stakeholders.
When should I stop an A/B test?
Stop when you've reached your pre-calculated required sample size (use our Sample Size Calculator before starting the test) AND your significance threshold is met. Do not stop early if results look significant — this is called "peeking" and inflates your false positive rate. If you've hit your sample size but results are still inconclusive, it's safe to call the test a null result and move on.
Want to Know What to Test First?
Statistical tools tell you if a test won — a CRO audit tells you why visitors aren't converting in the first place. We analyze your analytics, heatmaps, and UX to build a prioritized test roadmap.
Book a CRO AuditStarting at $2,500 · 5–7 day delivery