Statistical Power

Hypothesis Testing

The Probability of Finding Real Effects

Statistical power is the probability of correctly rejecting a false null hypothesis — the chance of detecting a real effect when it exists. Low power means wasting resources on studies destined to fail.

Clinical Trials — Ensuring studies are large enough to detect meaningful treatment effects
Grant Applications — Justifying sample sizes with power analysis calculations
A/B Testing — Avoiding underpowered tests that waste advertising budgets

Power analysis is the difference between good science and expensive guessing.

DfStatistical Power

Power is the probability of correctly rejecting the null hypothesis when the alternative is true:

\text{Power} = 1 - \beta = P(\text{Reject } H_0 \mid H_1 \text{ is true})

where $\beta = P(\text{Fail to reject } H_0 \mid H_1 \text{ is true})$ is the Type II error rate.

Power and the Four Outcomes

The power framework connects four possible outcomes of a hypothesis test:

	$H_0$ true	$H_1$ true
Reject $H_0$	Type I error ( $\alpha$ )	Power ( $1-\beta$ )
Fail to reject $H_0$	Correct ( $1-\alpha$ )	Type II error ( $\beta$ )

Factors Affecting Power

ThFive Determinants of Power

Power depends on five quantities:

Sample size ( $n$ ): Larger $n$ -> larger power (more information)
Effect size ( $d$ or $\delta$ ): Larger effect -> easier to detect -> larger power
Significance level ( $\alpha$ ): Larger $\alpha$ -> easier to reject -> larger power (but more Type I errors)
Population variance ( $\sigma^2$ ): Smaller variance -> less noise -> larger power
One-sided vs. two-sided test: One-sided tests have more power in the predicted direction

Factor	Direction	Effect on Power
$n$	$\uparrow$	$\uparrow$
Effect size	$\uparrow$	$\uparrow$
$\alpha$	$\uparrow$	$\uparrow$ (but $\uparrow$ Type I error)
$\sigma^2$	$\downarrow$	$\uparrow$
One-sided test	—	$\uparrow$ (vs. two-sided)

The Power Function

Power Function for One-Sample Z-Test

\pi(\delta) = P\left(Z > z_{1-\alpha} - \frac{\delta\sqrt{n}}{\sigma}\right) = 1 - \Phi\left(z_{1-\alpha} - \frac{\delta\sqrt{n}}{\sigma}\right)

Here,

$\pi(\delta)$ =Power as a function of the true effect δ
$\delta$ =True difference from H₀: δ = μ − μ₀
$z_{1-\alpha}$ =Critical value for significance level α
$\sigma$ =Population standard deviation

The power is a monotone increasing function of $|\delta|$ , $n$ , and $\alpha$ , and a monotone decreasing function of $\sigma$ .

A Priori Power Analysis

DfA Priori Power Analysis

An a priori (prospective) power analysis is conducted before data collection to determine the minimum sample size needed to achieve a desired power for a given effect size and significance level.

Sample Size Formula for One-Sample Z-Test

n = \left(\frac{(z_{1-\alpha/2} + z_{1-\beta}) \cdot \sigma}{\delta}\right)^2

Here,

$n$ =Required sample size
$z_{1-\alpha/2}$ =Critical value for two-sided α
$z_{1-\beta}$ =Critical value for desired power 1−β
$\sigma$ =Population standard deviation
$\delta$ =Minimum detectable effect size

Common Power Thresholds

Power	Assessment
$< 0.50$	Very underpowered — likely to miss real effects
$0.50 - 0.79$	Underpowered — risky
$\geq 0.80$	Conventional minimum (Cohen, 1992)
$\geq 0.90$	Strong — suitable for high-stakes decisions
$\geq 0.95$	Very strong — clinical trials often target this

Cohen's Effect Size Conventions

Effect Size ( $d$ )	Interpretation
0.2	Small
0.5	Medium
0.8	Large

Cohen's d

Cohen's $d$ measures the standardized difference between two means:

d = \frac{\mu_1 - \mu_2}{\sigma}

These conventions are guidelines, not absolutes. Always consider the minimum effect size that would be scientifically or practically meaningful.

Post-Hoc Power Analysis

ThControversy with Post-Hoc Power

Post-hoc power analysis (conducted after data collection) is circular and misleading. For a given observed effect size $\hat{\delta}$ and $p$ -value:

\text{Power} = P\left(Z > z_{1-\alpha} \mid \delta = \hat{\delta}\right)

This is a deterministic function of the $p$ -value — it adds no new information. A non-significant result will always show low post-hoc power (because the effect was small relative to noise), but this does not mean the study was inherently underpowered.

Power and Confidence Intervals

ThRelationship Between Power and Confidence Intervals

A study with power $1-\beta$ at significance level $\alpha$ for effect size $\delta_0$ will have the $(1-\alpha)$ confidence interval contained within the interval $(\delta_0 - \delta_{\min}, \delta_0 + \delta_{\min})$ , where $\delta_{\min}$ is the minimum detectable effect.

Equivalently, the width of the confidence interval determines the precision of the estimate, which directly affects power.

Key Takeaways

Summary: Statistical Power

Power = 1 − β = probability of detecting a true effect — the complement of Type II error
Always conduct a priori power analysis before collecting data — this determines your minimum $n$
80% power is the conventional minimum — many journals require this
Post-hoc power analysis is controversial — it is circular (just a function of the $p$ -value)
Underpowered studies waste resources and produce false negatives
Small effect sizes require large $n$ — planning for effect size is the key decision
Power depends on: $n$ , effect size, $\alpha$ , variance, and whether the test is one- or two-sided

Statistical Power — Definition, Factors, and Power Analysis