Statistical Power
Hypothesis Testing
The Probability of Finding Real Effects
Statistical power is the probability of correctly rejecting a false null hypothesis — the chance of detecting a real effect when it exists. Low power means wasting resources on studies destined to fail.
- Clinical Trials — Ensuring studies are large enough to detect meaningful treatment effects
- Grant Applications — Justifying sample sizes with power analysis calculations
- A/B Testing — Avoiding underpowered tests that waste advertising budgets
Power analysis is the difference between good science and expensive guessing.
DfStatistical Power
Power is the probability of correctly rejecting the null hypothesis when the alternative is true:
where is the Type II error rate.
Power and the Four Outcomes
The power framework connects four possible outcomes of a hypothesis test:
| true | true | |
|---|---|---|
| Reject | Type I error () | Power () |
| Fail to reject | Correct () | Type II error () |
Factors Affecting Power
ThFive Determinants of Power
Power depends on five quantities:
- Sample size (): Larger -> larger power (more information)
- Effect size ( or ): Larger effect -> easier to detect -> larger power
- Significance level (): Larger -> easier to reject -> larger power (but more Type I errors)
- Population variance (): Smaller variance -> less noise -> larger power
- One-sided vs. two-sided test: One-sided tests have more power in the predicted direction
| Factor | Direction | Effect on Power |
|---|---|---|
| Effect size | ||
| (but Type I error) | ||
| One-sided test | — | (vs. two-sided) |
The Power Function
Power Function for One-Sample Z-Test
Here,
- =Power as a function of the true effect δ
- =True difference from H₀: δ = μ − μ₀
- =Critical value for significance level α
- =Population standard deviation
The power is a monotone increasing function of , , and , and a monotone decreasing function of .
A Priori Power Analysis
DfA Priori Power Analysis
An a priori (prospective) power analysis is conducted before data collection to determine the minimum sample size needed to achieve a desired power for a given effect size and significance level.
Sample Size Formula for One-Sample Z-Test
Here,
- =Required sample size
- =Critical value for two-sided α
- =Critical value for desired power 1−β
- =Population standard deviation
- =Minimum detectable effect size
Common Power Thresholds
| Power | Assessment |
|---|---|
| Very underpowered — likely to miss real effects | |
| Underpowered — risky | |
| Conventional minimum (Cohen, 1992) | |
| Strong — suitable for high-stakes decisions | |
| Very strong — clinical trials often target this |
Cohen's Effect Size Conventions
| Effect Size () | Interpretation |
|---|---|
| 0.2 | Small |
| 0.5 | Medium |
| 0.8 | Large |
Cohen's d
Cohen's measures the standardized difference between two means:
These conventions are guidelines, not absolutes. Always consider the minimum effect size that would be scientifically or practically meaningful.
Post-Hoc Power Analysis
ThControversy with Post-Hoc Power
Post-hoc power analysis (conducted after data collection) is circular and misleading. For a given observed effect size and -value:
This is a deterministic function of the -value — it adds no new information. A non-significant result will always show low post-hoc power (because the effect was small relative to noise), but this does not mean the study was inherently underpowered.
Power and Confidence Intervals
ThRelationship Between Power and Confidence Intervals
A study with power at significance level for effect size will have the confidence interval contained within the interval , where is the minimum detectable effect.
Equivalently, the width of the confidence interval determines the precision of the estimate, which directly affects power.
Key Takeaways
Summary: Statistical Power
- Power = 1 − β = probability of detecting a true effect — the complement of Type II error
- Always conduct a priori power analysis before collecting data — this determines your minimum
- 80% power is the conventional minimum — many journals require this
- Post-hoc power analysis is controversial — it is circular (just a function of the -value)
- Underpowered studies waste resources and produce false negatives
- Small effect sizes require large — planning for effect size is the key decision
- Power depends on: , effect size, , variance, and whether the test is one- or two-sided