Type I and Type II Errors

Hypothesis Testing

The Two Ways to Get It Wrong

Every statistical test carries risk of false positives (Type I) or false negatives (Type II). Understanding this tradeoff is essential for designing studies and interpreting results responsibly.

Medicine — Balancing the risk of approving ineffective drugs versus withholding effective ones
Criminal Justice — The presumption of innocence mirrors the null hypothesis framework
Manufacturing — Setting inspection criteria that balance reject/accept error rates

There is no free lunch: reducing one error type increases the other.

In hypothesis testing, two types of mistakes are possible. Understanding them is essential for designing studies, choosing sample sizes, and interpreting results.

The Decision Matrix

DfHypothesis Testing Outcomes

When we test $H_0$ against $H_1$ , four outcomes are possible:

	$H_0$ is true	$H_0$ is false
Reject $H_0$	Type I Error ( $\alpha$ )	Correct decision (Power = $1-\beta$ )
Fail to reject $H_0$	Correct decision ( $1-\alpha$ )	Type II Error ( $\beta$ )

Formal Definitions

DfType I Error (False Positive)

A Type I error occurs when we reject $H_0$ even though $H_0$ is true. Its probability is:

\alpha = P(\text{Reject } H_0 \mid H_0 \text{ is true})

This is the significance level — set by the researcher before the study begins.

DfType II Error (False Negative)

A Type II error occurs when we fail to reject $H_0$ even though $H_1$ is true. Its probability is:

\beta = P(\text{Fail to reject } H_0 \mid H_1 \text{ is true})

DfStatistical Power

Power is the probability of correctly rejecting $H_0$ when $H_1$ is true:

\text{Power} = 1 - \beta = P(\text{Reject } H_0 \mid H_1 \text{ is true})

The Fundamental Tradeoff

Thα–β Tradeoff

For a fixed sample size $n$ and effect size, decreasing $\alpha$ increases $\beta$ (and vice versa). There is no way to simultaneously minimize both error types without increasing the sample size.

Strategy	Effect on $\alpha$	Effect on $\beta$	Effect on Power
Decrease $\alpha$ (e.g., 0.05 -> 0.01)	$\downarrow$	$\uparrow$	$\downarrow$
Increase $n$	No change	$\downarrow$	$\uparrow$
Increase effect size	No change	$\downarrow$	$\uparrow$
Decrease $\sigma$	No change	$\downarrow$	$\uparrow$
One-tailed test (vs two-tailed)	No change	$\downarrow$	$\uparrow$ (in predicted direction)

Consequences in Practice

Domain	Type I Error (False Positive)	Type II Error (False Negative)
Medicine	Approving an ineffective drug	Missing a life-saving treatment
Criminal justice	Convicting an innocent person	Letting a guilty person go free
Quality control	Rejecting a good batch	Shipping defective products
Spam filtering	Blocking legitimate email	Allowing spam to reach inbox
Security	False alarm	Missing a real intrusion

Asymmetric Costs

In most real-world settings, the costs of Type I and Type II errors are not equal. In drug approval, a Type I error (approving a useless drug) wastes resources, while a Type II error (rejecting a useful drug) costs lives. The choice of $\alpha$ should reflect this asymmetry.

Effect of Sample Size on Power

ThPower Increases with Sample Size

As $n$ increases, the standard error $\sigma/\sqrt{n}$ decreases, making the test statistic more concentrated under $H_1$ . This simultaneously:

Keeps $\alpha$ fixed (at the pre-specified level)
Reduces $\beta$ (increases power)

Power is a monotone increasing function of $n$ for any fixed effect size and $\alpha$ .

Effect Size and Practical Significance

Statistical vs. Practical Significance

A very large sample can make a trivially small effect statistically significant. Conversely, a small sample may fail to detect a large, practically important effect. Always report:

The p-value (statistical significance)
The effect size (practical significance)
The confidence interval (precision of the estimate)

Key Takeaways

Summary: Type I and Type II Errors

Type I error ( $\alpha$ ): Reject $H_0$ when true — false positive — probability set by the researcher
Type II error ( $\beta$ ): Fail to reject $H_0$ when false — false negative
Power = $1 - \beta$ — probability of detecting a true effect
Reducing $\alpha$ increases $\beta$ — there is always a tradeoff at fixed $n$
Increasing $n$ reduces both types of errors simultaneously
The consequences of each error type should guide $\alpha$ — in medicine, Type I can harm patients
Always conduct a priori power analysis to ensure your study is adequately powered

Type I and Type II Errors — False Positives, False Negatives, Power