🎉 75% of content is free forever — Unlock Premium from $10/mo →
CW
Search courses…
💼 Servicesℹ️ About✉️ ContactView Pricing Plansfrom $10

F-Distribution — Ratio of Variances

Foundations of StatisticsSampling Distributions🟢 Free Lesson

Advertisement

F-Distribution — Ratio of Variances

Foundations of Statistics

The Engine Behind ANOVA and F-Tests

The F-distribution emerges as the ratio of two chi-square variables, making it the backbone of analysis of variance and equality-of-variance tests. Its skewed shape reflects the ratio's non-negative nature.

  • Agriculture — Comparing crop yields across multiple fertilizer treatments
  • Psychology — Analyzing variance in experimental designs with multiple groups
  • Engineering — Testing whether manufacturing processes produce consistent results

The F-distribution turns multiple group comparisons into a single elegant test.


Core Concepts

The F-distribution arises as the ratio of two independent chi-square random variables, each divided by its degrees of freedom. It is the basis for ANOVA and F-tests.

DfF-Distribution

If Uχd12U \sim \chi^2_{d_1} and Vχd22V \sim \chi^2_{d_2} are independent, then F=U/d1V/d2F = \frac{U/d_1}{V/d_2} follows an F-distribution with d1d_1 (numerator) and d2d_2 (denominator) degrees of freedom, written FFd1,d2F \sim F_{d_1, d_2}.

F-Statistic

F=s12/σ12s22/σ22=s12s22(when σ12=σ22)F = \frac{s_1^2 / \sigma_1^2}{s_2^2 / \sigma_2^2} = \frac{s_1^2}{s_2^2} \quad \text{(when } \sigma_1^2 = \sigma_2^2\text{)}

Here,

  • s12,s22s_1^2, s_2^2=Sample variances from two populations
  • d1,d2d_1, d_2=Degrees of freedom for numerator and denominator
  • FF=F-statistic (ratio of variances)

Key Properties

  • Always positive and right-skewed
  • Fd1,d2F_{d_1, d_2} is related to the reciprocal: 1/Fd1,d2=Fd2,d11/F_{d_1, d_2} = F_{d_2, d_1}
  • As degrees of freedom increase, F approaches 1
  • The mode (for d2>2d_2 > 2) is at d12d1d2d2+2<1\frac{d_1 - 2}{d_1} \cdot \frac{d_2}{d_2 + 2} < 1

Interactive Visualization

F-Distribution — Interactive Explorer
00.91.72.63.44.35.16F00.160.320.470.630.79f(x)μ = 1.25Mo = 0.50F(d₁=5, d₂=10)
Mean (μ) = 1.2500
Effect of Degrees of Freedom on F-Distribution
00.91.72.63.44.35.16F00.220.450.670.901.12f(x)μ = 1.25Mo = 0.50F(d₁=5, d₂=10)
Mean (μ) = 1.2500

Mean, Variance, and Moments

F-Distribution Mean

E[Fd1,d2]=d2d22,d2>2E[F_{d_1, d_2}] = \frac{d_2}{d_2 - 2}, \quad d_2 > 2

Here,

  • d1d_1=Numerator degrees of freedom
  • d2d_2=Denominator degrees of freedom

Variance and Higher Moments

Var(Fd1,d2)=2d22(d1+d22)d1(d22)2(d24),d2>4\text{Var}(F_{d_1, d_2}) = \frac{2d_2^2(d_1 + d_2 - 2)}{d_1(d_2 - 2)^2(d_2 - 4)}, \quad d_2 > 4

The variance exists only when d2>4d_2 > 4. The F-distribution is always right-skewed, with skewness decreasing as both degrees of freedom increase.

When d1=d2d_1 = d_2, E[F]=d/(d2)E[F] = d/(d-2), which approaches 1 as dd \to \infty. This makes sense: if both variances estimate the same σ2\sigma^2, their ratio should be near 1.


Derivation: Why the F-Distribution Appears

ThDistribution of the Variance Ratio

If X1,,Xn1i.i.d.N(μ1,σ2)X_1, \ldots, X_{n_1} \overset{\text{i.i.d.}}{\sim} N(\mu_1, \sigma^2) and Y1,,Yn2i.i.d.N(μ2,σ2)Y_1, \ldots, Y_{n_2} \overset{\text{i.i.d.}}{\sim} N(\mu_2, \sigma^2) independently, then:

F=sX2sY2Fn11,n21F = \frac{s_X^2}{s_Y^2} \sim F_{n_1-1, n_2-1}

Proof Sketch

Step 1. By the chi-square result: U=(n11)sX2σ2χn112U = \frac{(n_1-1)s_X^2}{\sigma^2} \sim \chi^2_{n_1-1} and V=(n21)sY2σ2χn212V = \frac{(n_2-1)s_Y^2}{\sigma^2} \sim \chi^2_{n_2-1}.

Step 2. Since the samples are independent, UU and VV are independent.

Step 3. Therefore F=U/(n11)V/(n21)=sX2sY2Fn11,n21F = \frac{U/(n_1-1)}{V/(n_2-1)} = \frac{s_X^2}{s_Y^2} \sim F_{n_1-1, n_2-1} by definition.

The key requirements are: (1) normal populations, (2) equal variances, and (3) independent samples.


ANOVA Connection

ThF-Test in One-Way ANOVA

In one-way ANOVA with kk groups and NN total observations, define:

MSbetween=j=1knj(XˉjXˉ)2k1,MSwithin=j=1ki=1nj(XijXˉj)2Nk\text{MS}_{\text{between}} = \frac{\sum_{j=1}^k n_j(\bar{X}_j - \bar{X})^2}{k-1}, \quad \text{MS}_{\text{within}} = \frac{\sum_{j=1}^k \sum_{i=1}^{n_j}(X_{ij} - \bar{X}_j)^2}{N-k}

Under H0:μ1==μkH_0: \mu_1 = \cdots = \mu_k:

F=MSbetweenMSwithinFk1,NkF = \frac{\text{MS}_{\text{between}}}{\text{MS}_{\text{within}}} \sim F_{k-1, N-k}

Proof Sketch

Under H0H_0, all observations come from N(μ,σ2)N(\mu, \sigma^2). The numerator MSbetween\text{MS}_{\text{between}} is a chi-square variable divided by k1k-1, and MSwithin\text{MS}_{\text{within}} is a chi-square variable divided by NkN-k, and they are independent by Cochran's theorem. Their ratio follows Fk1,NkF_{k-1, N-k}.


Worked Example

Two methods for measuring blood glucose are compared. Method A (nA=12n_A = 12) gives sA2=4.2s_A^2 = 4.2; Method B (nB=15n_B = 15) gives sB2=2.8s_B^2 = 2.8. Test H0:σA2=σB2H_0: \sigma_A^2 = \sigma_B^2 at α=0.05\alpha = 0.05.

Step 1. Compute the F-statistic:

F=sA2sB2=4.22.8=1.50F = \frac{s_A^2}{s_B^2} = \frac{4.2}{2.8} = 1.50

Step 2. Under H0H_0, FF11,14F \sim F_{11, 14}. The upper critical value is F0.025,11,14=3.10F_{0.025, 11, 14} = 3.10.

Step 3. Since 1.50<3.101.50 < 3.10, we fail to reject H0H_0. There is insufficient evidence that the variances differ.

Step 4. Note the asymmetry: for a two-sided test, we could also consider F0.975,11,14=1/F0.025,14,11=1/3.53=0.283F_{0.975, 11, 14} = 1/F_{0.025, 14, 11} = 1/3.53 = 0.283. We check whether F<0.283F < 0.283 or F>3.10F > 3.10. Since 0.283<1.50<3.100.283 < 1.50 < 3.10, we fail to reject.

F-Test Sensitivity

The F-test for equal variances is highly sensitive to non-normality. The Levene test or Bartlett test should be preferred in practice. If the populations are not normal, the F-test can have severely inflated Type I error rates.


Key Takeaways

Summary: F-Distribution

  • Ratio of two independent chi-squares, each divided by its df: F=(U/d1)/(V/d2)F = (U/d_1)/(V/d_2)
  • Always positive and right-skewed; E[F]=d2/(d22)E[F] = d_2/(d_2 - 2) for d2>2d_2 > 2
  • Used in ANOVA (comparing group means) and F-tests (comparing variances)
  • Fd1,d2F_{d_1, d_2} is the reciprocal distribution of Fd2,d1F_{d_2, d_1}
  • Two-sample F-test for equality of variances: F=s12/s22F = s_1^2/s_2^2
  • Requires normality and independence — sensitive to violations

Premium Content

F-Distribution — Ratio of Variances

Unlock this lesson and 900+ advanced tutorials with a Premium plan.

🎯End-to-end Projects
💼Interview Prep
📜Certificates
🤝Community Access

Already a member? Log in

Need Expert Statistics Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement