πŸŽ‰ 75% of content is free forever β€” Unlock Premium from $10/mo β†’
CW
Search courses…
πŸ’Ό Servicesℹ️ Aboutβœ‰οΈ ContactView Pricing Plansfrom $10

Hypothesis Testing

StatisticsTesting🟒 Free Lesson

Advertisement

Hypothesis Testing

Why It Matters

Hypothesis testing is the backbone of scientific discovery and data-driven decision making. Whether validating a clinical trial, tuning a machine learning model, or measuring the impact of a new feature, hypothesis testing provides the formal framework to distinguish real effects from random noise. Without it, every observed difference β€” no matter how small or how likely to occur by chance β€” could be mistaken for a meaningful finding.


Overview

Every hypothesis test begins by formulating two competing statements about a population parameter. The null hypothesis (H0H_0) is the default assumption of no effect. The alternative hypothesis (H1H_1) is the claim that an effect exists. A test statistic measures how far observed data deviates from H0H_0. The p-value quantifies the probability of seeing results at least as extreme if H0H_0 is true. We reject H0H_0 when the p-value falls below the significance level Ξ±\alpha. Two types of errors are possible: Type I (false positive, probability Ξ±\alpha) and Type II (false negative, probability Ξ²\beta). Power (1βˆ’Ξ²1 - \beta) is the probability of detecting a real effect, and increases with effect size, sample size, and Ξ±\alpha.


Key Concepts

Test Statistic (Z-Test)

z=xΛ‰βˆ’ΞΌ0Οƒ/nz = \frac{\bar{x} - \mu_0}{\sigma / \sqrt{n}}

Here,

  • xΛ‰\bar{x}=Sample mean
  • ΞΌ0\mu_0=Hypothesized population mean
  • Οƒ\sigma=Population standard deviation
  • nn=Sample size

P-Value

p=P(data or more extreme∣H0 is true)p = P(\text{data or more extreme} \mid H_0 \text{ is true})

Here,

  • pp=Probability of observing results this extreme under Hβ‚€

Power of a Test

Power=1βˆ’Ξ²=P(RejectΒ H0∣H0Β isΒ false)\text{Power} = 1 - \beta = P(\text{Reject } H_0 \mid H_0 \text{ is false})

Here,

  • Ξ²\beta=Type II error probability

Sample Size for Power

n=(z1βˆ’Ξ±/2+z1βˆ’Ξ²d)2n = \left(\frac{z_{1-\alpha/2} + z_{1-\beta}}{d}\right)^2

Here,

  • dd=Cohen's d (effect size)
  • z1βˆ’Ξ±/2z_{1-\alpha/2}=Critical value for significance level
  • z1βˆ’Ξ²z_{1-\beta}=Critical value for desired power

Cohen's d (Effect Size)

d=xΛ‰1βˆ’xΛ‰2spd = \frac{\bar{x}_1 - \bar{x}_2}{s_p}

Here,

  • sps_p=Pooled standard deviation

Error Matrix

H0H_0 is TrueH0H_0 is False
Reject H0H_0Type I Error (Ξ±\alpha) β€” false positivePower (1βˆ’Ξ²1-\beta) β€” true positive
Fail to Reject H0H_0Correct β€” true negativeType II Error (Ξ²\beta) β€” false negative

Effect Size Benchmarks (Cohen's d)

EffectCohen's dInterpretation
Small0.2Subtle, hard to detect
Medium0.5Noticeable practical effect
Large0.8Strong, clearly visible

P-Value Interpretation

P-ValueEvidence Against H0H_0
p<0.01p < 0.01Very strong
p<0.05p < 0.05Strong
p<0.10p < 0.10Weak
pβ‰₯0.10p \geq 0.10Little or none

Quick Example

One-Sample T-Test

A researcher claims average response time is 200ms. Sample: n=25n = 25, xˉ=215\bar{x} = 215, s=30s = 30.

t=215βˆ’20030/25=156=2.5t = \frac{215 - 200}{30/\sqrt{25}} = \frac{15}{6} = 2.5

With df=24df = 24, critical value t0.025,24=2.064t_{0.025, 24} = 2.064. Since ∣t∣=2.5>2.064|t| = 2.5 > 2.064, reject H0H_0.

There is sufficient evidence that the mean response time differs from 200ms.

Power Analysis

To detect a medium effect (d=0.5d = 0.5) at Ξ±=0.05\alpha = 0.05 with power = 0.80:

n=(1.96+0.8420.5)2=(2.8020.5)2β‰ˆ64n = \left(\frac{1.96 + 0.842}{0.5}\right)^2 = \left(\frac{2.802}{0.5}\right)^2 \approx 64

You need approximately 64 participants per group.


Key Takeaways

Summary: Hypothesis Testing

  • Decision Rule: Reject H0H_0 if p-value ≀ Ξ±\alpha. Never say "accept H0H_0" β€” say "fail to reject."
  • p-value: Probability of results this extreme given H0H_0 is true. NOT the probability that H0H_0 is true.
  • Type I vs Type II: Type I = false positive (Ξ±\alpha); Type II = false negative (Ξ²\beta). Reducing one increases the other for fixed nn.
  • Power: Increases with effect size, sample size, and Ξ±\alpha. Always conduct power analysis before collecting data.
  • Effect Size: A tiny effect can be "significant" with large nn. Always report Cohen's d alongside p-values.
  • One vs Two Tailed: Use two-tailed as the default. One-tailed requires a strong a priori directional prediction.
  • Multiple Comparisons: Many tests inflate family-wise error. Use Bonferroni, Holm, or FDR correction.
  • Statistical vs Practical: Statistical significance β‰  practical significance. Always consider effect size and context.

Deep Dive

For detailed explanations, worked examples, and Python implementations, explore the dedicated statistics lessons:

Hypothesis Formulation

Errors and Significance

  • Type I and Type II Errors β€” Error matrix, trade-off, and real-world consequences
  • P-Values β€” Calculation, interpretation, and common misinterpretations
  • Significance Levels β€” Choosing Ξ±\alpha, multiple testing, and when to use 0.01 vs 0.05

Power and Effect Size

  • Power of a Test β€” Factors affecting power, a priori power analysis, and underpowered studies
  • Effect Size β€” Cohen's d, Hedges' g, eta-squared, and why practical significance matters

Related Topics

⭐

Premium Content

Hypothesis Testing

Unlock this lesson and 900+ advanced tutorials with a Premium plan.

🎯End-to-end Projects
πŸ’ΌInterview Prep
πŸ“œCertificates
🀝Community Access

Already a member? Log in

Need Expert Mathematics Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement