Type I and Type II Errors
Hypothesis Testing
The Two Ways to Get It Wrong
Every statistical test carries risk of false positives (Type I) or false negatives (Type II). Understanding this tradeoff is essential for designing studies and interpreting results responsibly.
- Medicine — Balancing the risk of approving ineffective drugs versus withholding effective ones
- Criminal Justice — The presumption of innocence mirrors the null hypothesis framework
- Manufacturing — Setting inspection criteria that balance reject/accept error rates
There is no free lunch: reducing one error type increases the other.
In hypothesis testing, two types of mistakes are possible. Understanding them is essential for designing studies, choosing sample sizes, and interpreting results.
The Decision Matrix
DfHypothesis Testing Outcomes
When we test against , four outcomes are possible:
| is true | is false | |
|---|---|---|
| Reject | Type I Error () | Correct decision (Power = ) |
| Fail to reject | Correct decision () | Type II Error () |
Formal Definitions
DfType I Error (False Positive)
A Type I error occurs when we reject even though is true. Its probability is:
This is the significance level — set by the researcher before the study begins.
DfType II Error (False Negative)
A Type II error occurs when we fail to reject even though is true. Its probability is:
DfStatistical Power
Power is the probability of correctly rejecting when is true:
The Fundamental Tradeoff
Thα–β Tradeoff
For a fixed sample size and effect size, decreasing increases (and vice versa). There is no way to simultaneously minimize both error types without increasing the sample size.
| Strategy | Effect on | Effect on | Effect on Power |
|---|---|---|---|
| Decrease (e.g., 0.05 -> 0.01) | |||
| Increase | No change | ||
| Increase effect size | No change | ||
| Decrease | No change | ||
| One-tailed test (vs two-tailed) | No change | (in predicted direction) |
Consequences in Practice
| Domain | Type I Error (False Positive) | Type II Error (False Negative) |
|---|---|---|
| Medicine | Approving an ineffective drug | Missing a life-saving treatment |
| Criminal justice | Convicting an innocent person | Letting a guilty person go free |
| Quality control | Rejecting a good batch | Shipping defective products |
| Spam filtering | Blocking legitimate email | Allowing spam to reach inbox |
| Security | False alarm | Missing a real intrusion |
Asymmetric Costs
In most real-world settings, the costs of Type I and Type II errors are not equal. In drug approval, a Type I error (approving a useless drug) wastes resources, while a Type II error (rejecting a useful drug) costs lives. The choice of should reflect this asymmetry.
Effect of Sample Size on Power
ThPower Increases with Sample Size
As increases, the standard error decreases, making the test statistic more concentrated under . This simultaneously:
- Keeps fixed (at the pre-specified level)
- Reduces (increases power)
Power is a monotone increasing function of for any fixed effect size and .
Effect Size and Practical Significance
Statistical vs. Practical Significance
A very large sample can make a trivially small effect statistically significant. Conversely, a small sample may fail to detect a large, practically important effect. Always report:
- The p-value (statistical significance)
- The effect size (practical significance)
- The confidence interval (precision of the estimate)
Key Takeaways
Summary: Type I and Type II Errors
- Type I error (): Reject when true — false positive — probability set by the researcher
- Type II error (): Fail to reject when false — false negative
- Power = — probability of detecting a true effect
- Reducing increases — there is always a tradeoff at fixed
- Increasing reduces both types of errors simultaneously
- The consequences of each error type should guide — in medicine, Type I can harm patients
- Always conduct a priori power analysis to ensure your study is adequately powered