πŸŽ‰ 75% of content is free forever β€” Unlock Premium from $10/mo β†’
CW
Search courses…
πŸ’Ό Servicesℹ️ Aboutβœ‰οΈ ContactView Pricing Plansfrom $10

ANOVA

StatisticsComparing Groups🟒 Free Lesson

Advertisement

ANOVA

Why It Matters

When comparing means across three or more groups, running multiple t-tests inflates the Type I error rate. With 3 groups, there are 3 pairwise comparisons, each at Ξ± = 0.05, giving a family-wise error rate of about 14%. ANOVA (Analysis of Variance) tests whether any group means differ while controlling the overall error rate at Ξ±. It is the standard method for experiments with multiple treatment conditions, A/B/n testing, and any scenario involving categorical predictors with continuous outcomes.


Overview

ANOVA partitions total variance in the data into between-group variance (differences attributable to the treatment) and within-group variance (random noise within groups). The F-statistic is the ratio of between-group to within-group variance: F=MSbetween/MSwithinF = MS_{between} / MS_{within}. A large F indicates that group means are more spread than expected by chance. One-way ANOVA tests means across levels of a single factor. Two-way ANOVA tests two factors and their interaction. ANOVA is an omnibus test β€” it only tells you that at least one group differs, not which groups differ. Post-hoc tests (Tukey's HSD, Bonferroni) identify specific pairwise differences after a significant F-test.


Key Concepts

F-Statistic (One-Way ANOVA)

F=MSbetweenMSwithin=SSbetween/(kβˆ’1)SSwithin/(Nβˆ’k)F = \frac{\text{MS}_{between}}{\text{MS}_{within}} = \frac{SS_{between} / (k-1)}{SS_{within} / (N-k)}

Here,

  • SSbetweenSS_{between}=Sum of squares between groups: $\sum n_j(\bar{x}_j - \bar{x})^2$
  • SSwithinSS_{within}=Sum of squares within groups: $\sum\sum(x_{ij} - \bar{x}_j)^2$
  • kk=Number of groups
  • NN=Total number of observations

F-Distribution

F∼Fkβˆ’1,Nβˆ’kF \sim F_{k-1, N-k}

Here,

  • kβˆ’1k-1=Numerator degrees of freedom (between groups)
  • Nβˆ’kN-k=Denominator degrees of freedom (within groups)

Effect Size: Eta-Squared

Ξ·2=SSbetweenSStotal\eta^2 = \frac{SS_{between}}{SS_{total}}

Here,

  • SStotalSS_{total}=Total sum of squares: $SS_{between} + SS_{within}$

Two-Way ANOVA Model

yijk=ΞΌ+Ξ±i+Ξ²j+(Ξ±Ξ²)ij+Ο΅ijky_{ijk} = \mu + \alpha_i + \beta_j + (\alpha\beta)_{ij} + \epsilon_{ijk}

Here,

  • Ξ±i\alpha_i=Main effect of factor A
  • Ξ²j\beta_j=Main effect of factor B
  • (Ξ±Ξ²)ij(\alpha\beta)_{ij}=Interaction effect between factors A and B
  • Ο΅ijk\epsilon_{ijk}=Random error term

ANOVA Assumptions

  1. Independence: Observations are independent within and across groups. Violations (e.g., repeated measures) require different tests.
  2. Normality: Residuals are approximately normally distributed. Robust to violations with large nn (CLT).
  3. Homogeneity of variances: Groups have equal population variances. Test with Levene's test. If violated, use Welch's ANOVA.

Sum of Squares Decomposition

SStotal=SSbetween+SSwithinSS_{total} = SS_{between} + SS_{within}
  • SSbetweenSS_{between}: Variation due to group differences (explainable)
  • SSwithinSS_{within}: Variation within groups (unexplainable noise)
  • Ξ·2=SSbetween/SStotal\eta^2 = SS_{between} / SS_{total}: Proportion of total variance explained by groups

Post-Hoc Tests

TestWhen to UseControlsConservativeness
Tukey's HSDAll pairwise comparisonsFamily-wise errorModerate
BonferroniFew planned comparisonsFamily-wise errorConservative
ScheffeAll possible contrastsFamily-wise errorMost conservative
DunnettEach group vs. controlFamily-wise errorModerate

Quick Example

One-Way ANOVA

Three drugs tested: F=4.5F = 4.5, df1=2df_1 = 2, df2=87df_2 = 87.

Using the F-distribution: pβ‰ˆ0.013<0.05p \approx 0.013 < 0.05. Reject H0H_0: at least one drug differs.

ANOVA is omnibus β€” we know something differs, but not what. Post-hoc Tukey HSD identifies which specific pairs differ. Without post-hoc, we cannot claim Drug A is better than Drug B.

Two-Way ANOVA

Testing effect of drug (A vs B) and dosage (low vs high) on recovery time. Two-way ANOVA tests three hypotheses simultaneously:

  1. Does drug type matter? (main effect of drug)
  2. Does dosage matter? (main effect of dosage)
  3. Does the effect of drug depend on dosage? (interaction)

If the interaction is significant, the effect of drug depends on dosage level β€” you cannot interpret main effects in isolation.

Post-Hoc Analysis

ANOVA with 4 groups yields F=5.2F = 5.2, p=0.003p = 0.003. Reject H0H_0: at least one group differs.

Tukey's HSD reveals: Group A vs C (p=0.001p = 0.001), Group A vs D (p=0.02p = 0.02), Group B vs C (p=0.004p = 0.004), Group B vs D (p=0.03p = 0.03). Groups C and D don't differ from each other (p=0.91p = 0.91).

Assumption Checking in Python

from scipy import stats

# Levene's test for equal variances
_, p_levene = stats.levene(group1, group2, group3)
print(f"Levene's test p-value: {p_levene:.3f}")

# Shapiro-Wilk test for normality of residuals
_, p_normal = stats.shapiro(residuals)
print(f"Normality test p-value: {p_normal:.3f}")

If Levene's test is significant (p<0.05p < 0.05), use Welch's ANOVA or the nonparametric Kruskal-Wallis test instead.


Key Takeaways

Summary: ANOVA

  • Purpose: Compare means across 3+ groups while controlling family-wise Type I error.
  • F-statistic: Between-group variance / Within-group variance. Large F -> reject H0H_0.
  • Hypotheses: H0H_0: all group means equal. H1H_1: at least one differs.
  • Post-hoc: ANOVA is omnibus β€” use Tukey's HSD or Bonferroni to find which groups differ.
  • Assumptions: Normality, equal variances, independence. Levene's test checks homoscedasticity.
  • Effect Size: Ξ·2\eta^2 = proportion of variance explained by the group factor.
  • Relationship to t-test: One-way ANOVA with 2 groups is equivalent to a two-sample t-test (F=t2F = t^2).
  • Two-Way ANOVA: Tests main effects of two factors and their interaction term.
  • Nonparametric Alternative: Kruskal-Wallis test when ANOVA assumptions are violated.

Deep Dive

For detailed explanations, worked examples, and Python implementations, explore the dedicated statistics lessons:

ANOVA

  • One-Way ANOVA β€” Complete derivation, F-distribution, sum of squares decomposition, post-hoc tests, and Python implementation

Related Tests

Related Topics

⭐

Premium Content

ANOVA

Unlock this lesson and 900+ advanced tutorials with a Premium plan.

🎯End-to-end Projects
πŸ’ΌInterview Prep
πŸ“œCertificates
🀝Community Access

Already a member? Log in

Need Expert Mathematics Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement