P-hacking is manipulating data or analysis to get "significant" results. Common in CRO: peeking at results daily, testing multiple metrics, mining segments post-hoc. Prevention: pre-register tests, commit to one primary metric, calculate sample size upfront, don't peek. P-hacking turns 5% false positive rate into 30-50%.
What is P-Hacking?
P-hacking (also called data dredging or fishing) is when you manipulate your analysis to get statistically significant results.
It's not always intentional.
Most CRO practitioners p-hack accidentally by following "common sense" practices that are statistically invalid.
Real Examples of P-Hacking in CRO
Stopping when significant
Check daily, stop as soon as p < 0.05
Impact: False positive rate → 20-30%
Testing multiple metrics
Test 10 metrics, report the one that "won"
Impact: False positive rate → 40%
Segmentation mining
Test overall fails, find a segment where it "works"
Impact: False positive rate → 50%+
Excluding outliers post-hoc
Remove "bad data" after seeing results
Impact: Biased results
Running until significant
Keep test running until you get p < 0.05
Impact: Guaranteed false positive
Prevention Checklist
Pre-register your test
Document hypothesis, metrics, and sample size before starting
One primary metric
Decide on THE metric before the test, not after
Fixed sample size
Calculate upfront, commit to it
No peeking
Don't check results until you reach sample size
Report all tests
Don't hide failed tests or cherry-pick segments
Why This Matters
P-hacking doesn't just produce false positives—it actively harms your business:
- • Implement changes that hurt conversions (thinking they help)
- • Waste engineering time on rollouts that don't work
- • Lose trust in experimentation when "winners" don't replicate
- • Make bad business decisions based on noise, not signal
Run Valid Tests
ExperimentHQ helps prevent p-hacking by warning you about peeking and encouraging proper sample size calculations.