A/B Testing Best Practices

Run reliable experiments that drive real business results. This guide covers everything from hypothesis design to interpreting results correctly.

TL;DR — The 5 Golden Rules

1.Start with a clear, written hypothesis
2.Calculate sample size before you start
3.Never peek at results or stop early
4.Run for at least 1-2 full weeks
5.Document everything for future learning

Before the Test

Start with a clear hypothesis

Write a specific hypothesis: "If we [change], then [metric] will [improve] because [reason]." This focuses your test and helps you learn regardless of outcome.

Bad: "Let's test a new button." Good: "If we change the CTA from 'Sign Up' to 'Start Free Trial', signups will increase 15% because it reduces perceived commitment."

Calculate sample size upfront

Determine how many visitors you need before starting. This prevents stopping too early (invalid results) or running too long (wasted time).

Use our sample size calculator with your baseline conversion rate and minimum detectable effect.

Define your primary metric

Choose one primary metric to determine the winner. Secondary metrics provide context but shouldn't change your decision.

Primary metrics should directly impact business goals. Revenue > clicks.

Set a test duration

Plan to run for at least 1-2 full weeks to capture weekly patterns. Don't stop based on early results.

Account for weekday/weekend differences and any known seasonal patterns.

During the Test

Don't peek at results

Checking results daily and stopping when you see significance inflates false positive rates from 5% to 30%+. Wait until you reach your planned sample size.

If you must monitor, use sequential testing methods that account for multiple looks.

Never change the test mid-flight

Any modification to your variant invalidates all previous data. If you need to change something, start a new test.

This includes changing copy, design, targeting rules, or traffic allocation.

Monitor for technical issues only

It's okay to check that the test is running correctly (no errors, tracking working), but don't look at conversion data.

Set up alerts for technical issues rather than checking manually.

Avoid external interference

Don't run marketing campaigns or make other site changes that could affect your test during the experiment.

If something unavoidable happens (site outage, major news), document it for analysis.

After the Test

Wait for statistical significance

Don't declare a winner until you reach 95% confidence (or your pre-defined threshold) AND your planned sample size.

Reaching significance early doesn't mean you should stop. Wait for your full sample size.

Consider practical significance

A statistically significant 0.1% improvement might not be worth implementing. Consider the business impact and implementation cost.

Calculate the expected revenue impact before deciding to implement.

Analyze segments

Your overall result might hide important segment-level differences. Check mobile vs. desktop, new vs. returning users, etc.

Be careful of multiple comparisons — segment analysis is exploratory, not confirmatory.

Document everything

Hypothesis documented

Sample size calculated

Primary metric defined

Test duration planned

QA completed on all variants

Tracking verified

No conflicting campaigns

Team aligned on decision criteria

Ready to Run Better Experiments?

ExperimentHQ makes it easy to follow best practices with built-in sample size calculation, no-flicker testing, and clear statistical reporting.

Start Free Today View Pricing

A/B Testing Best Practices

TL;DR — The 5 Golden Rules

Before the Test

Start with a clear hypothesis

Calculate sample size upfront

Define your primary metric

Set a test duration

During the Test

Don't peek at results

Never change the test mid-flight

Monitor for technical issues only

Avoid external interference

After the Test

Wait for statistical significance

Consider practical significance

Analyze segments

Document everything

Common Mistakes to Avoid

Testing too many variations

Testing tiny changes

Ignoring flicker

Running tests on low-traffic pages

Not accounting for novelty effect

Pre-Launch Checklist

Related Resources

15 A/B Testing Mistakes

Sample Size Guide

A/B Testing Statistics

Experiment Templates

Ready to Run Better Experiments?