P-Values: The Most Misunderstood Number in Statistics
Hypothesis Testing
What P-Values Actually Tell You
The p-value is the probability of observing data as extreme as yours if the null hypothesis were true — not the probability that H₀ is true. Correct interpretation prevents the most common statistical mistakes.
- Scientific Publishing — Understanding why p < 0.05 is not a stamp of truth
- Business Decisions — Knowing when statistical significance matters versus practical significance
- Legal Settings — Interpreting statistical evidence in forensic and discrimination cases
The p-value answers a specific question — make sure you're asking the right one.
The p-value is simultaneously the most used and most misused concept in statistics. Its correct interpretation requires careful attention to conditional probability.
The Exact Definition
DfP-Value
The p-value is the probability, computed under the null hypothesis , of obtaining a test statistic as extreme as or more extreme than the value actually observed. Formally:
where is the test statistic under and is the observed value.
Conditional Probability is the Key
The p-value is conditional on being true. It answers: "How surprising is this data, given that $H_0$ is true?" It does not answer: "How likely is $H_0$, given this data?" This distinction is the source of most misinterpretations.
Formal Framework
ThP-Value as a Random Variable
Under , the p-value is a random variable with the following properties:
- If is true:
- for any significance level — this is the Type I error rate
- If is false: tends to be small (concentrated near 0)
The rejection rule is: reject if .
P-Value Formula for Two-Sided Test
Here,
- =Two-sided p-value
- =Test statistic under H₀
- =Observed value of the test statistic
What a P-Value IS and IS NOT
| Statement | Correct? | Why |
|---|---|---|
| "p = 0.03 means there's a 3% chance is true" | WRONG | P-value is conditional on , not |
| "p = 0.03 means: if were true, only 3% of samples would yield this extreme a result" | CORRECT | This is the definition |
| "p = 0.03 means the result is practically important" | WRONG | Statistical significance ≠ practical significance |
| "p = 0.03 means the study will replicate 97% of the time" | WRONG | Replication probability depends on true effect size |
| "p > 0.05 proves is true" | WRONG | Failure to reject ≠ acceptance of |
The P-Value Confounds Effect Size with Sample Size
ThP-Value Depends on n
For a fixed effect size, the p-value decreases monotonically with sample size:
As , for any non-zero effect, making the p-value arbitrarily small regardless of practical importance.
Implication: A tiny, practically meaningless effect will be "statistically significant" with a large enough sample. Conversely, a large effect may not reach significance with a small sample.
The ASA Statement on P-Values (2016)
The American Statistical Association issued six principles:
- P-values can indicate how incompatible the data are with .
- P-values do not measure the probability that is true.
- Scientific conclusions should not be based on whether a p-value exceeds a threshold.
- Proper inference requires full reporting and transparency.
- A p-value does not measure the size or importance of an effect.
- By itself, a p-value does not provide a good measure of evidence.
Recommended Reporting Practice
Best Practice
When reporting results, include all three:
- The p-value — for the significance decision
- The confidence interval — for the range of plausible effect sizes
- The effect size — for practical importance (e.g., Cohen's , )
This gives readers the complete picture: statistical significance, precision, and magnitude.
P-Value and Confidence Intervals
ThDuality Between P-Values and Confidence Intervals
A two-sided test at significance level rejects if and only if lies outside the confidence interval for .
This equivalence means confidence intervals contain strictly more information than p-values: they convey both the direction and magnitude of the effect, not just whether it differs from zero.
Key Takeaways
Summary: P-Values
- P-value = P(data this extreme | true) — it says nothing directly about
- is the threshold for rejection — but is arbitrary, not magical
- Statistical significance practical significance — a tiny difference can be "significant" with enough data
- Always report effect sizes and confidence intervals alongside p-values
- does not mean "no effect" — it means "insufficient evidence against "
- Pre-register your hypotheses to avoid p-hacking and false discoveries
- The duality with confidence intervals means CIs contain more information than p-values alone