🎉 75% of content is free forever — Unlock Premium from $10/mo →
CW
Search courses…
💼 Servicesℹ️ About✉️ ContactView Pricing Plansfrom $10

Sample Size Determination — How Many Observations Do You Need?

Foundations of StatisticsSampling Theory🟢 Free Lesson

Advertisement

Sample Size Determination — How Many Observations Do You Need?

Foundations of Statistics

Planning for Statistical Success

Sample size determination ensures studies have adequate power to detect meaningful effects while avoiding unnecessary data collection. It balances statistical requirements against practical constraints like time, cost, and ethics.

  • Clinical Trials — Ensuring sufficient power to detect clinically meaningful treatment effects
  • Market Research — Optimizing survey costs while maintaining estimate precision
  • Quality Assurance — Determining inspection sample sizes for reliable defect detection

The right sample size is the foundation of trustworthy statistical conclusions.


What Is Sample Size Determination?

DfSample Size Determination

Sample size determination is the process of calculating the number of observations needed to achieve a desired level of precision and power in a statistical study. Too few observations leads to inconclusive results; too many wastes resources.


Core Formulas

Sample Size for Estimating a Mean

n=(zα/2σE)2n = \left(\frac{z_{\alpha/2} \cdot \sigma}{E}\right)^2

Here,

  • nn=Required sample size
  • zα/2z_{\alpha/2}=Critical z-value for the desired confidence level
  • σ\sigma=Population standard deviation (estimated)
  • EE=Desired margin of error

Sample Size for Estimating a Proportion

n=zα/22p(1p)E2n = \frac{z_{\alpha/2}^2 \cdot p(1-p)}{E^2}

Here,

  • pp=Estimated population proportion
  • EE=Desired margin of error

Conservative Estimate for p

When pp is unknown, use p=0.5p = 0.5 for the most conservative (largest) sample size, since p(1p)p(1-p) is maximized at p=0.5p = 0.5 with value 0.250.25.


Derivation: Inverting the Margin of Error

ThSample Size from Margin of Error

Starting from the margin of error formula E=zα/2σ/sqrtnE = z_{\alpha/2}\sigma/\\sqrt{n}, solve for nn:

n=zα/2σE    n=(zα/2σE)2\sqrt{n} = \frac{z_{\alpha/2} \cdot \sigma}{E} \implies n = \left(\frac{z_{\alpha/2} \cdot \sigma}{E}\right)^2

Since nn must be an integer, always round up to the next whole number: n=lceil(zα/2σ/E)2rceiln = \\lceil (z_{\alpha/2}\sigma/E)^2 \\rceil.

Proof sketch: The margin of error is the half-width of the CI. Setting EE to the desired precision and solving for nn gives the minimum sample size that achieves that precision. Rounding up ensures the actual margin is at most EE.


Sample Size for Hypothesis Testing (Power Analysis)

Sample Size for Two-Sided Test

n=(zα/2+zβ)22σ2δ2n = \frac{(z_{\alpha/2} + z_{\beta})^2 \cdot 2\sigma^2}{\delta^2}

Here,

  • α\alpha=Significance level (Type I error rate)
  • β\beta=Type II error rate; power $= 1 - \beta$
  • σ\sigma=Population standard deviation
  • δ\delta=Minimum detectable effect size

ThPower and Sample Size Trade-off

For a fixed effect size δ\delta and significance level α\alpha, the required sample size scales as:

nσ2δ2n \propto \frac{\sigma^2}{\delta^2}

This reveals two critical insights:

  1. Detecting smaller effects requires more data: halving δ\delta requires 4×4\times the sample.
  2. More variable populations require more data: doubling σ\sigma requires 4×4\times the sample.

Worked Example: Clinical Trial Design

A pharmaceutical company wants to detect a 3 mmHg reduction in blood pressure with 80% power at α=0.05\alpha = 0.05. Prior studies suggest σ=8\sigma = 8 mmHg.

Step 1: Identify parameters: δ=3\delta = 3, σ=8\sigma = 8, α=0.05\alpha = 0.05 (z0.025=1.96z_{0.025} = 1.96), power =0.80= 0.80 (β=0.20\beta = 0.20, z0.20=0.842z_{0.20} = 0.842).

Step 2: Compute:

n=(1.96+0.842)2×2×649=(2.802)2×1289=7.851×1289=1004.99112n = \frac{(1.96 + 0.842)^2 \times 2 \times 64}{9} = \frac{(2.802)^2 \times 128}{9} = \frac{7.851 \times 128}{9} = \frac{1004.9}{9} \approx 112

Step 3: Round up: n=112n = 112 per group, total N=224N = 224.

Accounting for Attrition

In practice, inflate the required nn by the expected dropout rate dd: nadjusted=n/(1d)n_{\text{adjusted}} = n/(1-d). For a 15% dropout rate: 112/0.85approx132112/0.85 \\approx 132 per group.


The Effect Size Pyramid

Sample Size Dependencies

The required sample size depends on four quantities:

FactorEffect on nnExample
Effect size δ\deltanpropto1/δ2n \\propto 1/\delta^2Halving effect to\\to 4×4\times sample
Standard deviation σ\sigmanproptoσ2n \\propto \sigma^2Doubling variance to\\to 4×4\times sample
Power 1β1-\betanpropto(zα/2+zβ)2n \\propto (z_{\alpha/2}+z_\beta)^280% to 90% power to\\to sim1.7×\\sim 1.7\times sample
Significance α\alphanproptozα/22n \\propto z_{\alpha/2}^20.050.05 to 0.010.01 to\\to sim1.4×\\sim 1.4\times sample

Python Implementation

import numpy as np
from scipy import stats

def sample_size_mean(sigma, E, alpha=0.05):
    """Sample size for estimating a mean with margin of error E."""
    z = stats.norm.ppf(1 - alpha / 2)
    return int(np.ceil((z * sigma / E) ** 2))

def sample_size_proportion(p, E, alpha=0.05):
    """Sample size for estimating a proportion with margin of error E."""
    z = stats.norm.ppf(1 - alpha / 2)
    return int(np.ceil(z**2 * p * (1 - p) / E**2))

def sample_size_two_sample(delta, sigma, alpha=0.05, power=0.80):
    """Sample size per group for two-sample t-test."""
    z_alpha = stats.norm.ppf(1 - alpha / 2)
    z_beta = stats.norm.ppf(power)
    return int(np.ceil(2 * sigma**2 * (z_alpha + z_beta)**2 / delta**2))

# Example 1: Mean estimation
print(f"n for σ=10, E=2, 95% CI: {sample_size_mean(10, 2, 0.05)}")
print(f"n for σ=10, E=1, 95% CI: {sample_size_mean(10, 1, 0.05)}")

# Example 2: Proportion estimation
print(f"n for p=0.5, E=0.03, 95% CI: {sample_size_proportion(0.5, 0.03)}")
print(f"n for p=0.2, E=0.03, 95% CI: {sample_size_proportion(0.2, 0.03)}")

# Example 3: Two-sample test
print(f"n per group for δ=3, σ=8, 80% power: {sample_size_two_sample(3, 8, 0.05, 0.80)}")
print(f"n per group for δ=2, σ=8, 80% power: {sample_size_two_sample(2, 8, 0.05, 0.80)}")

Key Takeaways

Summary: Sample Size Determination

  • For precision: n=(zα/2σ/E)2n = (z_{\alpha/2}\sigma/E)^2 — round up
  • For power: n=(zα/2+zβ)22σ2/δ2n = (z_{\alpha/2}+z_\beta)^2 \cdot 2\sigma^2/\delta^2 — per group
  • The 1/δ21/\delta^2 relationship means detecting small effects is expensive
  • Always estimate σ\sigma from prior studies or pilot data before computing nn
  • Account for attrition, clustering, and multiple comparisons in your final nn

Premium Content

Sample Size Determination — How Many Observations Do You Need?

Unlock this lesson and 900+ advanced tutorials with a Premium plan.

🎯End-to-end Projects
💼Interview Prep
📜Certificates
🤝Community Access

Already a member? Log in

Need Expert Statistics Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement