🎉 75% of content is free forever — Unlock Premium from $10/mo →
CW
Search courses…
💼 Servicesℹ️ About✉️ ContactView Pricing Plansfrom $10

Multiple Testing Problem — FWER and False Discovery Rate

Hypothesis TestingAdvanced Topics🟢 Free Lesson

Advertisement

The Multiple Testing Problem

Hypothesis Testing

When Testing More Means Finding More Wrong

The multiple testing problem causes false discovery rates to explode when running many hypothesis tests simultaneously. Without correction, most "significant" findings in large-scale studies are false positives.

  • Genomics — Testing thousands of genes for differential expression in disease studies
  • Neuroimaging — Analyzing millions of brain voxels for activation differences
  • Quality Control — Inspecting multiple product characteristics simultaneously

The more you test, the more you must correct — or drown in false discoveries.


If you run 20 hypothesis tests at α = 0.05 when all H₀s are true, you expect 1 false positive by chance. Run 1000 tests? Expect 50 false discoveries.

DfFamily-Wise Error Rate (FWER)

FWER = P(at least one false rejection) = 1 − (1−α)^m for independent tests


FWER Inflation

Visualizing FWER Inflation

import numpy as np
from scipy import stats
from statsmodels.stats.multitest import multipletests
import matplotlib.pyplot as plt

# Show FWER inflation
alpha = 0.05
m_tests = np.arange(1, 201)
fwer = 1 - (1 - alpha)**m_tests

plt.figure(figsize=(10, 5))
plt.plot(m_tests, fwer, 'b-', linewidth=2)
plt.axhline(0.05, color='red', linestyle='--', label='α=0.05 (desired level)')
plt.axhline(0.50, color='orange', linestyle=':', label='50% false positive rate')
plt.xlabel('Number of Tests (m)')
plt.ylabel('FWER (P at least one false positive)')
plt.title('Multiple Testing Problem: FWER Inflation')
plt.legend()
plt.grid(True, alpha=0.3)
plt.savefig('multiple_testing.png', dpi=150)
plt.show()

print(f"FWER with {5} tests:   {1-(1-0.05)**5:.4f}")
print(f"FWER with {20} tests:  {1-(1-0.05)**20:.4f}")
print(f"FWER with {100} tests: {1-(1-0.05)**100:.4f}")

Comparison of Correction Methods

Bonferroni, Holm, and Benjamini-Hochberg

# ==========================================
# CORRECTIONS
# ==========================================
np.random.seed(42)
n_tests = 20
n_truly_null = 15  # 15 of 20 are truly null
n_true_alt = 5    # 5 truly have effects

# Simulate p-values
null_p = np.random.uniform(0, 1, n_truly_null)       # null: uniform
alt_p = np.random.beta(0.3, 5, n_true_alt)           # alternative: small p-values
all_p = np.concatenate([null_p, alt_p])
np.random.shuffle(all_p)

print(f"\nUncorrected: {(all_p < 0.05).sum()} rejections (some are false)")

# Bonferroni correction
bonf_alpha = 0.05 / n_tests
print(f"\nBonferroni (α/m = {bonf_alpha:.4f}): {(all_p < bonf_alpha).sum()} rejections")

# Holm-Bonferroni (uniformly more powerful than Bonferroni)
reject_holm, pvals_corrected_holm, _, _ = multipletests(all_p, alpha=0.05, method='holm')
print(f"Holm-Bonferroni: {reject_holm.sum()} rejections")

# Benjamini-Hochberg (controls FDR, not FWER — more power for many tests)
reject_bh, pvals_corrected_bh, _, _ = multipletests(all_p, alpha=0.05, method='fdr_bh')
print(f"Benjamini-Hochberg (FDR): {reject_bh.sum()} rejections")

print("\nB-H procedure: controls the EXPECTED proportion of false discoveries, not probability of any")
print("  FDR ≤ α (expected false discovery rate)")
print("  More powerful than Bonferroni for many simultaneous tests")

FWER vs FDR

Choosing a Correction Method

MethodControlsPowerUse When
BonferroniFWER (strong)LowestFew tests, any false positive is catastrophic
HolmFWER (strong)Higher than BonferroniFew-moderate tests
Benjamini-HochbergFDRHighestMany tests (genomics, fMRI), some false positives acceptable

Key Takeaways

Summary: Multiple Testing Problem

  • Run m tests at α -> expect m×α false positives when all nulls are true
  • Bonferroni: divide α by m — simple, conservative, appropriate for few tests
  • Benjamini-Hochberg controls FDR — used in genomics (thousands of tests)
  • Pre-specify your comparisons before seeing data — reduces multiple testing inflation
  • Report all tests conducted — selective reporting is a major source of false discoveries

Premium Content

Multiple Testing Problem — FWER and False Discovery Rate

Unlock this lesson and 900+ advanced tutorials with a Premium plan.

🎯End-to-end Projects
💼Interview Prep
📜Certificates
🤝Community Access

Already a member? Log in

Need Expert Statistics Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement