FREE TOOL

Statistical Significance Calculator

Did your A/B test produce a real winner — or just random noise? Enter your results to get an instant verdict with confidence level and p-value.

Control (A)

Variation (B)

Enter visitors and conversions for both control and variation

Uses a two-proportion z-test with pooled standard error. Significance threshold: p < 0.05 (95% confidence).

How to Interpret Your Results

✓ Significant Winner

p-value < 0.05 and variation > control. Your variation outperforms the control with 95%+ confidence. You can ship this change — though monitor post-launch to confirm the effect holds.

~ Inconclusive

p-value ≥ 0.05. There is not enough evidence to declare a winner or loser. Either continue the test until you reach your required sample size, or accept that the difference is too small to matter.

✗ Significant Loser

p-value < 0.05 and variation < control. Your variation is actively hurting conversions with statistical confidence. Stop the test, revert to control, and investigate why.

The Statistics Behind the Calculator

This calculator uses a two-proportion z-test with a pooled standard error. This is the most widely used statistical test for A/B testing conversion rates and is the method behind tools like Optimizely, VWO, and Google Optimize.

Pooled proportion:

p̂ = (x₁ + x₂) / (n₁ + n₂)

Standard error:

SE = √(p̂ × (1−p̂) × (1/n₁ + 1/n₂))

Z-score:

z = (p₂ − p₁) / SE

Two-tailed p-value:

p = 2 × min(Φ(z), 1 − Φ(z))

Where x₁/n₁ = control conversions/visitors, x₂/n₂ = variation conversions/visitors, Φ = standard normal CDF.

What "Statistically Significant" Doesn't Mean

Statistical significance is widely misinterpreted. Here's what a p < 0.05 result does not tell you:

  • It doesn't mean there's a 95% chance the variation is better.

    It means there's a 5% chance you'd see a difference this large by random chance if there were no true effect. Frequentist statistics don't give you probability of hypotheses being true.

  • It doesn't mean the observed lift will hold in production.

    The effect size you measure in a test is often an overestimate (regression to the mean). Always monitor results after shipping and verify the lift persists.

  • It doesn't mean the difference is practically meaningful.

    With very high traffic, you can achieve statistical significance on a 0.01% lift that generates no meaningful revenue. Always ask: "Is this effect size worth acting on?" before shipping.

Frequently Asked Questions

What is statistical significance in A/B testing?

Statistical significance tells you whether the observed difference between your control and variation is likely real, or likely due to random chance. At 95% significance (p < 0.05), there's only a 5% probability the difference you observed happened randomly. It's the threshold CRO teams use to decide whether to ship a variation or keep testing.

What is a p-value?

The p-value is the probability of observing a difference at least as large as the one in your test, assuming the null hypothesis (no true difference) is true. A p-value of 0.03 means there's a 3% chance of seeing your results by random chance alone. Below 0.05 is significant; below 0.01 is highly significant. Never interpret p-value as the probability that your variation wins.

My result is significant at 80% confidence. Should I ship?

Generally no. At 80% confidence, you have a 20% false positive rate — meaning 1 in 5 times you see "significant" results at this threshold, you'd be shipping a change that has no real effect or is actually harmful. Wait until you reach 95% confidence (p < 0.05) for shipping decisions, or use lower confidence only for low-stakes directional signals.

What is observed lift and how is it calculated?

Observed lift is the relative improvement of the variation over the control: lift = (variation_rate − control_rate) / control_rate × 100%. For example, control 2.0% → variation 2.4%: lift = (2.4 − 2.0) / 2.0 × 100% = +20%. This is the relative lift; the absolute lift is 0.4 percentage points. Both metrics matter when communicating impact to stakeholders.

When should I stop an A/B test?

Stop when you've reached your pre-calculated required sample size (use our Sample Size Calculator before starting the test) AND your significance threshold is met. Do not stop early if results look significant — this is called "peeking" and inflates your false positive rate. If you've hit your sample size but results are still inconclusive, it's safe to call the test a null result and move on.

Want to Know What to Test First?

Statistical tools tell you if a test won — a CRO audit tells you why visitors aren't converting in the first place. We analyze your analytics, heatmaps, and UX to build a prioritized test roadmap.

Book a CRO Audit

Starting at $2,500 · 5–7 day delivery