A/B Testing Sample Size: Complete Guide

Learn how to calculate the right sample size for your A/B tests. Avoid underpowered experiments that miss real effects or waste time running tests too long.

TL;DR — Quick Reference

  • Sample size depends on: baseline rate, minimum detectable effect, power, and significance
  • Standard settings: 80% power, 95% significance level
  • Smaller effects need larger sample sizes (exponentially more)
  • Always calculate before starting — never peek at results early

Quick Sample Size Calculator

Your current conversion rate

Smallest improvement you want to detect

Probability of detecting a real effect (typically 80%)

Confidence level (typically 95%)

Required Sample Size

Per variation:

8,150

Total (both variations):

16,300

For a more comprehensive calculator, try our full sample size calculator.

What is Sample Size in A/B Testing?

Definition:

Sample size in A/B testing refers to the number of users or observations needed in each variation to detect a statistically significant difference, if one exists. It determines how long you need to run your test.

Calculating sample size before starting an experiment is crucial. Without adequate sample size:

  • You might miss real improvements (false negatives)
  • You might declare winners that aren't real (false positives)
  • You waste time running tests too long
  • You make decisions based on noise, not signal

Factors That Determine Sample Size

1. Baseline Conversion Rate

Your current conversion rate before any changes. Lower baseline rates require larger sample sizes because there's more variance in the data.

Example: A 2% conversion rate needs more samples than a 10% rate to detect the same relative improvement.

2. Minimum Detectable Effect (MDE)

Definition:

Minimum Detectable Effect (MDE) is the smallest relative improvement you want to be able to detect with your test. It's expressed as a percentage of the baseline.

Smaller MDEs require exponentially larger sample sizes. A 5% MDE needs ~4x more samples than a 10% MDE.

Practical tip: Start with a 10-20% MDE. If you need to detect smaller effects, consider whether the business impact justifies the longer test duration.

3. Statistical Power

Definition:

Statistical power is the probability of detecting a real effect when one exists. The industry standard is 80%, meaning you'll correctly identify a real winner 80% of the time.

Higher power (90% or 95%) reduces false negatives but requires larger sample sizes.

4. Significance Level (α)

Definition:

Significance level is the probability of declaring a winner when there's no real difference (false positive rate). The standard is 5% (95% confidence level).

Lower significance levels (99% confidence) require larger sample sizes but reduce false positives.

The Sample Size Formula

For a two-sample proportion test (standard A/B test):

n = 2 × (Zα/2 + Zβ)² × p̄(1-p̄) / (p₂ - p₁)²

Where:

  • n = sample size per variation
  • Zα/2 = Z-score for significance level (1.96 for 95%)
  • Zβ = Z-score for power (0.84 for 80%)
  • p̄ = pooled conversion rate ((p₁ + p₂) / 2)
  • p₁ = baseline conversion rate
  • p₂ = expected conversion rate with improvement

In practice, you don't need to calculate this by hand. Use our sample size calculator or the quick calculator above.

Common Sample Size Mistakes

❌ Not calculating sample size upfront

Running a test without knowing how many visitors you need leads to either stopping too early (invalid results) or running too long (wasted time).

❌ Peeking at results

Checking results daily and stopping when you see significance inflates your false positive rate from 5% to 30%+. Wait until you reach your calculated sample size.

❌ Using unrealistic MDEs

Expecting a 50% improvement is unrealistic for most tests. A 5-20% relative improvement is more typical. Use realistic MDEs to get accurate sample size estimates.

❌ Ignoring the number of variations

Each additional variation requires more total traffic. Testing 5 variations means you need 5x the traffic of an A/B test to maintain the same power.

Related Resources

Run Properly Powered Experiments

ExperimentHQ helps you calculate sample size automatically and alerts you when tests reach significance.