P-Hacking in CRO: Real Examples and Prevention

TL;DR

P-hacking is manipulating data or analysis to get "significant" results. Common in CRO: peeking at results daily, testing multiple metrics, mining segments post-hoc. Prevention: pre-register tests, commit to one primary metric, calculate sample size upfront, don't peek. P-hacking turns 5% false positive rate into 30-50%.

What is P-Hacking?

P-hacking (also called data dredging or fishing) is when you manipulate your analysis to get statistically significant results.

It's not always intentional.

Most CRO practitioners p-hack accidentally by following "common sense" practices that are statistically invalid.

Real Examples of P-Hacking in CRO

Stopping when significant

Check daily, stop as soon as p < 0.05

Impact: False positive rate → 20-30%

Testing multiple metrics

Test 10 metrics, report the one that "won"

Impact: False positive rate → 40%

Segmentation mining

Test overall fails, find a segment where it "works"

Impact: False positive rate → 50%+

Excluding outliers post-hoc

Remove "bad data" after seeing results

Impact: Biased results

Running until significant

Keep test running until you get p < 0.05

Impact: Guaranteed false positive

Prevention Checklist

Pre-register your test

Document hypothesis, metrics, and sample size before starting

One primary metric

Decide on THE metric before the test, not after

Fixed sample size

Calculate upfront, commit to it

No peeking

Don't check results until you reach sample size

Report all tests

Don't hide failed tests or cherry-pick segments

Why This Matters

P-hacking doesn't just produce false positives—it actively harms your business:

• Implement changes that hurt conversions (thinking they help)
• Waste engineering time on rollouts that don't work
• Lose trust in experimentation when "winners" don't replicate
• Make bad business decisions based on noise, not signal

Run Valid Tests

ExperimentHQ helps prevent p-hacking by warning you about peeking and encouraging proper sample size calculations.