F-Distribution — Ratio of Variances

Foundations of Statistics

The Engine Behind ANOVA and F-Tests

The F-distribution emerges as the ratio of two chi-square variables, making it the backbone of analysis of variance and equality-of-variance tests. Its skewed shape reflects the ratio's non-negative nature.

Agriculture — Comparing crop yields across multiple fertilizer treatments
Psychology — Analyzing variance in experimental designs with multiple groups
Engineering — Testing whether manufacturing processes produce consistent results

The F-distribution turns multiple group comparisons into a single elegant test.

Core Concepts

The F-distribution arises as the ratio of two independent chi-square random variables, each divided by its degrees of freedom. It is the basis for ANOVA and F-tests.

DfF-Distribution

If $U \sim \chi^2_{d_1}$ and $V \sim \chi^2_{d_2}$ are independent, then $F = \frac{U/d_1}{V/d_2}$ follows an F-distribution with $d_1$ (numerator) and $d_2$ (denominator) degrees of freedom, written $F \sim F_{d_1, d_2}$ .

F-Statistic

F = \frac{s_1^2 / \sigma_1^2}{s_2^2 / \sigma_2^2} = \frac{s_1^2}{s_2^2} \quad \text{(when } \sigma_1^2 = \sigma_2^2\text{)}

Here,

$s_1^2, s_2^2$ =Sample variances from two populations
$d_1, d_2$ =Degrees of freedom for numerator and denominator
$F$ =F-statistic (ratio of variances)

Key Properties

Always positive and right-skewed
$F_{d_1, d_2}$ is related to the reciprocal: $1/F_{d_1, d_2} = F_{d_2, d_1}$
As degrees of freedom increase, F approaches 1
The mode (for $d_2 > 2$ ) is at $\frac{d_1 - 2}{d_1} \cdot \frac{d_2}{d_2 + 2} < 1$

Interactive Visualization

F-Distribution — Interactive Explorer

Mean (μ) = 1.2500

Effect of Degrees of Freedom on F-Distribution

Mean (μ) = 1.2500

Mean, Variance, and Moments

F-Distribution Mean

E[F_{d_1, d_2}] = \frac{d_2}{d_2 - 2}, \quad d_2 > 2

Here,

$d_1$ =Numerator degrees of freedom
$d_2$ =Denominator degrees of freedom

Variance and Higher Moments

\text{Var}(F_{d_1, d_2}) = \frac{2d_2^2(d_1 + d_2 - 2)}{d_1(d_2 - 2)^2(d_2 - 4)}, \quad d_2 > 4

The variance exists only when $d_2 > 4$ . The F-distribution is always right-skewed, with skewness decreasing as both degrees of freedom increase.

When $d_1 = d_2$ , $E[F] = d/(d-2)$ , which approaches 1 as $d \to \infty$ . This makes sense: if both variances estimate the same $\sigma^2$ , their ratio should be near 1.

Derivation: Why the F-Distribution Appears

ThDistribution of the Variance Ratio

If $X_1, \ldots, X_{n_1} \overset{\text{i.i.d.}}{\sim} N(\mu_1, \sigma^2)$ and $Y_1, \ldots, Y_{n_2} \overset{\text{i.i.d.}}{\sim} N(\mu_2, \sigma^2)$ independently, then:

F = \frac{s_X^2}{s_Y^2} \sim F_{n_1-1, n_2-1}

Proof Sketch

Step 1. By the chi-square result: $U = \frac{(n_1-1)s_X^2}{\sigma^2} \sim \chi^2_{n_1-1}$ and $V = \frac{(n_2-1)s_Y^2}{\sigma^2} \sim \chi^2_{n_2-1}$ .

Step 2. Since the samples are independent, $U$ and $V$ are independent.

Step 3. Therefore $F = \frac{U/(n_1-1)}{V/(n_2-1)} = \frac{s_X^2}{s_Y^2} \sim F_{n_1-1, n_2-1}$ by definition.

The key requirements are: (1) normal populations, (2) equal variances, and (3) independent samples.

ANOVA Connection

ThF-Test in One-Way ANOVA

In one-way ANOVA with $k$ groups and $N$ total observations, define:

\text{MS}_{\text{between}} = \frac{\sum_{j=1}^k n_j(\bar{X}_j - \bar{X})^2}{k-1}, \quad \text{MS}_{\text{within}} = \frac{\sum_{j=1}^k \sum_{i=1}^{n_j}(X_{ij} - \bar{X}_j)^2}{N-k}

Under $H_0: \mu_1 = \cdots = \mu_k$ :

F = \frac{\text{MS}_{\text{between}}}{\text{MS}_{\text{within}}} \sim F_{k-1, N-k}

Proof Sketch

Under $H_0$ , all observations come from $N(\mu, \sigma^2)$ . The numerator $\text{MS}_{\text{between}}$ is a chi-square variable divided by $k-1$ , and $\text{MS}_{\text{within}}$ is a chi-square variable divided by $N-k$ , and they are independent by Cochran's theorem. Their ratio follows $F_{k-1, N-k}$ .

Worked Example

Two methods for measuring blood glucose are compared. Method A ( $n_A = 12$ ) gives $s_A^2 = 4.2$ ; Method B ( $n_B = 15$ ) gives $s_B^2 = 2.8$ . Test $H_0: \sigma_A^2 = \sigma_B^2$ at $\alpha = 0.05$ .

Step 1. Compute the F-statistic:

F = \frac{s_A^2}{s_B^2} = \frac{4.2}{2.8} = 1.50

Step 2. Under $H_0$ , $F \sim F_{11, 14}$ . The upper critical value is $F_{0.025, 11, 14} = 3.10$ .

Step 3. Since $1.50 < 3.10$ , we fail to reject $H_0$ . There is insufficient evidence that the variances differ.

Step 4. Note the asymmetry: for a two-sided test, we could also consider $F_{0.975, 11, 14} = 1/F_{0.025, 14, 11} = 1/3.53 = 0.283$ . We check whether $F < 0.283$ or $F > 3.10$ . Since $0.283 < 1.50 < 3.10$ , we fail to reject.

F-Test Sensitivity

The F-test for equal variances is highly sensitive to non-normality. The Levene test or Bartlett test should be preferred in practice. If the populations are not normal, the F-test can have severely inflated Type I error rates.

Key Takeaways

Summary: F-Distribution

Ratio of two independent chi-squares, each divided by its df: $F = (U/d_1)/(V/d_2)$
Always positive and right-skewed; $E[F] = d_2/(d_2 - 2)$ for $d_2 > 2$
Used in ANOVA (comparing group means) and F-tests (comparing variances)
$F_{d_1, d_2}$ is the reciprocal distribution of $F_{d_2, d_1}$
Two-sample F-test for equality of variances: $F = s_1^2/s_2^2$
Requires normality and independence — sensitive to violations

F-Distribution — Ratio of Variances

F-Distribution — Ratio of Variances

The Engine Behind ANOVA and F-Tests

Core Concepts

DfF-Distribution

F-Statistic

Interactive Visualization

Mean, Variance, and Moments

F-Distribution Mean

Derivation: Why the F-Distribution Appears

ThDistribution of the Variance Ratio

ANOVA Connection

ThF-Test in One-Way ANOVA

Worked Example

Key Takeaways

Summary: F-Distribution

Premium Content

Need Expert Statistics Help?