Chi-Square Distribution — Sum of Squared Normals
Foundations of Statistics
The Foundation of Variance-Based Inference
The chi-square distribution arises from summing squared normal variables, making it essential for testing variances and independence. Its applications span from quality control to genetics, wherever squared deviations matter.
- Genetics — Testing Hardy-Weinberg equilibrium in population studies
- Manufacturing — Quality control through variance testing and goodness-of-fit
- Market Research — Analyzing survey response patterns against expected distributions
The chi-square distribution connects normal theory to categorical data analysis.
Core Concepts
The chi-square distribution arises as the sum of squared independent standard normal random variables. It is fundamental to tests of variance and independence.
DfChi-Square Distribution
If are independent standard normals, then follows a chi-square distribution with degrees of freedom, written .
PDF of Chi-Square Distribution
Here,
- =Degrees of freedom
- =Gamma function
Special Cases
- = square of a single standard normal
- = exponential distribution with rate (i.e., )
- When is large, by the CLT
Interactive Visualization
Mean, Variance, and Higher Moments
Chi-Square Mean and Variance
Here,
- =Degrees of freedom
Proof
Mean: Since , we have .
Variance: Since are independent and (using the fourth moment of the normal):
The skewness is and excess kurtosis is , both decreasing to 0 as .
Relationship to Normal and Gamma
Connections
- where is the rate parameter
- for normal samples (used in variance tests)
- Sum of independent chi-squares: if independently, then
Derivation: Sample Variance and Chi-Square
ThDistribution of Sample Variance
If , then .
Proof Sketch
Step 1. Standardize: , so .
Step 2. Rewrite using the identity .
Step 3. By Fisher's lemma, and the deviations are independent.
Step 4. , so . Therefore by the reproductive property.
Step 5. Since , we get .
Worked Example
A quality engineer tests whether a filling machine has variance mL. From bottles, mL.
Step 1. Compute the chi-square test statistic:
Step 2. Under , with and , so .
Step 3. The observed value 37.8 is standard deviations above the mean.
Step 4. For a two-sided test at , the critical values are and . Since , we fail to reject .
Step 5. The chi-square distribution is asymmetric, so the test is inherently two-sided in a different way than z/t tests. The p-value is .
Asymmetry Matters
Unlike the normal distribution, the chi-square is right-skewed. The two-sided rejection region is not symmetric about . For , the lower critical value is 12.4 but the upper is 39.4 — far from symmetric about 24.
Normal Approximation
ThWilson–Hilferty Transformation
For large , the chi-square distribution can be approximated by a normal via:
This approximation is remarkably accurate even for as small as 5.
Key Takeaways
Summary: Chi-Square Distribution
- Sum of squared independent standard normals:
- Mean: , Variance: , Skewness:
- Always positive and right-skewed; approaches normal for large
- Used in tests of variance:
- Foundation for chi-square tests of independence and goodness-of-fit
- Reproductive property: sum of independent chi-squares is chi-square