P-Values: The Most Misunderstood Number in Statistics

Hypothesis Testing

What P-Values Actually Tell You

The p-value is the probability of observing data as extreme as yours if the null hypothesis were true — not the probability that H₀ is true. Correct interpretation prevents the most common statistical mistakes.

Scientific Publishing — Understanding why p < 0.05 is not a stamp of truth
Business Decisions — Knowing when statistical significance matters versus practical significance
Legal Settings — Interpreting statistical evidence in forensic and discrimination cases

The p-value answers a specific question — make sure you're asking the right one.

The p-value is simultaneously the most used and most misused concept in statistics. Its correct interpretation requires careful attention to conditional probability.

The Exact Definition

DfP-Value

The p-value is the probability, computed under the null hypothesis $H_0$ , of obtaining a test statistic as extreme as or more extreme than the value actually observed. Formally:

p = P\left(|T| \geq |t_{\text{obs}}| \;\Big|\; H_0 \text{ is true}\right)

where $T$ is the test statistic under $H_0$ and $t_{\text{obs}}$ is the observed value.

Conditional Probability is the Key

The p-value is conditional on $H_0$ being true. It answers: "How surprising is this data, given that $H_0$ is true?" It does not answer: "How likely is $H_0$, given this data?" This distinction is the source of most misinterpretations.

Formal Framework

ThP-Value as a Random Variable

Under $H_0$ , the p-value is a random variable $P$ with the following properties:

If $H_0$ is true: $P \sim \text{Uniform}(0, 1)$
$P(P \leq \alpha) = \alpha$ for any significance level $\alpha$ — this is the Type I error rate
If $H_0$ is false: $P$ tends to be small (concentrated near 0)

The rejection rule is: reject $H_0$ if $p \leq \alpha$ .

P-Value Formula for Two-Sided Test

p = 2 \cdot \min\left(P(T \geq t_{\text{obs}} \mid H_0), \; P(T \leq t_{\text{obs}} \mid H_0)\right)

Here,

$p$ =Two-sided p-value
$T$ =Test statistic under H₀
$t_{\text{obs}}$ =Observed value of the test statistic

What a P-Value IS and IS NOT

Statement	Correct?	Why
"p = 0.03 means there's a 3% chance $H_0$ is true"	WRONG	P-value is conditional on $H_0$ , not $P(H_0 \mid \text{data})$
"p = 0.03 means: if $H_0$ were true, only 3% of samples would yield this extreme a result"	CORRECT	This is the definition
"p = 0.03 means the result is practically important"	WRONG	Statistical significance ≠ practical significance
"p = 0.03 means the study will replicate 97% of the time"	WRONG	Replication probability depends on true effect size
"p > 0.05 proves $H_0$ is true"	WRONG	Failure to reject ≠ acceptance of $H_0$

The P-Value Confounds Effect Size with Sample Size

ThP-Value Depends on n

For a fixed effect size, the p-value decreases monotonically with sample size:

t = \frac{\bar{X} - \mu_0}{s / \sqrt{n}}

As $n \to \infty$ , $t \to \infty$ for any non-zero effect, making the p-value arbitrarily small regardless of practical importance.

Implication: A tiny, practically meaningless effect will be "statistically significant" with a large enough sample. Conversely, a large effect may not reach significance with a small sample.

The ASA Statement on P-Values (2016)

The American Statistical Association issued six principles:

P-values can indicate how incompatible the data are with $H_0$ .
P-values do not measure the probability that $H_0$ is true.
Scientific conclusions should not be based on whether a p-value exceeds a threshold.
Proper inference requires full reporting and transparency.
A p-value does not measure the size or importance of an effect.
By itself, a p-value does not provide a good measure of evidence.

Recommended Reporting Practice

Best Practice

When reporting results, include all three:

The p-value — for the significance decision
The confidence interval — for the range of plausible effect sizes
The effect size — for practical importance (e.g., Cohen's $d$ , $\eta^2$ )

This gives readers the complete picture: statistical significance, precision, and magnitude.

P-Value and Confidence Intervals

ThDuality Between P-Values and Confidence Intervals

A two-sided test at significance level $\alpha$ rejects $H_0: \mu = \mu_0$ if and only if $\mu_0$ lies outside the $(1-\alpha) \times 100\%$ confidence interval for $\mu$ .

\text{Reject } H_0 \text{ at level } \alpha \iff \mu_0 \notin \text{CI}_{1-\alpha}(\mu)

This equivalence means confidence intervals contain strictly more information than p-values: they convey both the direction and magnitude of the effect, not just whether it differs from zero.

Key Takeaways

Summary: P-Values

P-value = P(data this extreme | $H_0$ true) — it says nothing directly about $H_1$
$p < \alpha$ is the threshold for rejection — but $\alpha = 0.05$ is arbitrary, not magical
Statistical significance $\neq$ practical significance — a tiny difference can be "significant" with enough data
Always report effect sizes and confidence intervals alongside p-values
$p > 0.05$ does not mean "no effect" — it means "insufficient evidence against $H_0$ "
Pre-register your hypotheses to avoid p-hacking and false discoveries
The duality with confidence intervals means CIs contain more information than p-values alone

P-Values — What They Mean, What They Don't, and Common Misconceptions