How do I calculate sample size for an A/B test?

Sample size for an A/B test is calculated using the two-proportion z-test formula: n = (Z_alpha/2 + Z_beta)^2 × (p1(1-p1) + p2(1-p2)) / (p2-p1)^2. You need to know your baseline conversion rate, the minimum effect you want to detect (MDE), your desired significance level, and statistical power.

What significance level should I use for A/B testing?

95% significance (p < 0.05) is the industry standard for A/B testing. Use 90% for lower-stakes decisions where you can tolerate a slightly higher false positive rate. Use 99% for high-stakes decisions like pricing changes or major checkout redesigns where a false positive would be very costly.

What statistical power should I choose?

80% power is the standard for most A/B tests. This means you'll miss 20% of real effects (false negatives). If missing winners is very costly for your business, use 90% or 95% power — but these require significantly more traffic. 70% power is acceptable for exploratory tests where you're just getting directional signal.

What happens if I run a test with fewer visitors than the required sample size?

Running a test with an underpowered sample dramatically increases your rate of false negatives — you'll miss real winners. It also makes your significance results unreliable. The p-value threshold of 0.05 only provides valid false positive control when you've reached the required sample size.

FREE TOOL

Sample Size Calculator

How many visitors do you need before your A/B test results are reliable? Calculate the exact sample size with your preferred confidence level and power.

Baseline Conversion Rate (%)

Your control variation's current conversion rate

Minimum Detectable Effect / MDE (%)

Smallest relative improvement you want to reliably detect

Significance Level

Statistical Power

Fill in your baseline rate and MDE to calculate sample size

Uses the two-proportion z-test formula. Sample size is per variation — double for total traffic needed.

Understanding Sample Size for A/B Tests

Sample size is one of the most misunderstood concepts in A/B testing. Most teams either run tests that are wildly underpowered (stopping after a few hundred visitors), or they run tests for far longer than necessary because they never calculated upfront how much traffic they need.

The required sample size per variation depends on four variables: your baseline conversion rate, the minimum effect size you want to detect, your significance threshold, and your desired statistical power. Changing any one of these dramatically affects how much traffic you need.

The Formula

n = (Z_α/2 + Z_β)² × (p₁(1−p₁) + p₂(1−p₂)) / (p₂ − p₁)²

This is the standard two-proportion z-test sample size formula. It calculates the minimum number of observations needed in each group (per variation) to detect a difference of (p₂ − p₁) with the specified significance level (α) and power (1 − β).

Significance Level vs. Statistical Power

Significance Level (α)

Controls your false positive rate — the probability of declaring a winner when there's actually no real difference. At 95% significance (α = 0.05), you'll incorrectly declare a winner 5% of the time by pure chance. Lower α = fewer false positives, but requires more traffic.

Statistical Power (1 − β)

Controls your true positive rate — the probability of detecting a real effect when one exists. At 80% power, you'll correctly identify a true winner 80% of the time. The other 20% are false negatives: real improvements you miss. Higher power requires more traffic.

How MDE Affects Your Sample Size

Your Minimum Detectable Effect (MDE) has the biggest impact on sample size requirements. Here's how different MDEs compare on a 2% baseline with 95% significance and 80% power:

MDE	Target Rate	Sample/Variation	Feasibility
5%	2.10%	~156,000	Very High Traffic
10%	2.20%	~39,000	High Traffic
15%	2.30%	~17,500	Moderate Traffic
20%	2.40%	~9,900	Most Sites
30%	2.60%	~4,500	Low Traffic OK

Frequently Asked Questions

Is the sample size per variation or total?

Per variation. For a standard A/B test with two groups (control + one variation), multiply the result by 2 to get total visitors needed. For a three-way test (A/B/C), multiply by 3 — though multivariate tests with more than two variations require even more traffic and are harder to reach significance on.

What significance level should I use?

95% is the industry standard for most A/B tests. Use 90% for low-stakes decisions where you want shorter test durations and can accept a slightly higher false positive rate. Use 99% for high-impact changes like pricing, payment flow, or major checkout redesigns — where shipping the wrong winner would be very costly. Never go below 90%.

What power should I choose?

80% power is the standard for most CRO programs. It means you'll miss 20% of real effects, which is acceptable in most cases. Choose 90% if you have high traffic and can afford longer tests. 70% is acceptable only for early-stage exploratory tests where you're looking for directional signal rather than definitive decisions.

What if my required sample size is unreachable?

If the required sample size would take 3+ months at your current traffic levels, you have a few options: increase your MDE (only test bolder changes that produce larger effects), reduce the number of variations, focus testing on higher-traffic pages, or consolidate traffic to fewer test pages. Low-traffic sites often benefit more from qualitative CRO research (user sessions, heatmaps, user testing) than from statistically-powered A/B testing.

Related Tools

More free calculators to sharpen your CRO process

Not Sure What's Worth Testing?

Our CRO audits identify your highest-impact test opportunities from real analytics and user behavior data — so you're not testing random hypotheses, but validated friction points.

Book a CRO Audit

Starting at $2,500 · 5–7 day delivery