Central Limit Theorem — The Most Important Theorem in Statistics
Foundations of Statistics
The Theorem That Made Statistics Possible
The CLT explains why the normal distribution appears universally in nature and measurement, justifying nearly all parametric statistical methods. It guarantees that sample means become normal regardless of the underlying distribution.
- Scientific Research — Justifies using t-tests and confidence intervals for means
- Machine Learning — Enables statistical guarantees for ensemble methods and bagging
- Quality Engineering — Underpins statistical process control and Six Sigma
The CLT is the reason statistics works in practice.
The Central Limit Theorem (CLT) is arguably the single most important result in probability and statistics. It explains why the normal distribution appears universally in nature, measurement, and inference.
The Theorem
ThCentral Limit Theorem (Lindeberg–Lévy)
Let be independent and identically distributed random variables with and . Then as :
where is the sample mean.
In words: the standardized sample mean converges in distribution to the standard normal, regardless of the shape of the population distribution.
Convergence in Distribution
The notation means convergence in distribution: the CDF of the standardized statistic converges pointwise to at every continuity point . This does not require the underlying variables to be normally distributed — only that they have finite mean and variance.
The CLT Approximation
For large , the CLT gives the approximation:
CLT Approximation for the Sample Mean
Here,
- =Sample mean of n i.i.d. observations
- =Population mean
- =Population variance
- =Sample size
The variance of the sample mean is — it decreases with sample size, which is why larger samples give more precise estimates.
Rate of Convergence: The Berry–Esseen Theorem
ThBerry–Esseen Bound
The CLT convergence has a quantitative rate. Under the Lindeberg–Lévy conditions with :
where (Shevtsova, 2011). The error decreases as .
This bound tells us:
- Convergence is not instant — the error is
- For , the maximum error is roughly (if )
- Skewed or heavy-tailed distributions converge more slowly
Minimum Sample Size Guidelines
When is $n$ Large Enough?
The required depends on the population shape:
| Population Shape | Minimum |
|---|---|
| Symmetric, light-tailed | |
| Moderately skewed | |
| Heavily skewed | |
| Extremely skewed (e.g., exponential) | |
| For proportions: and | — |
Never apply the CLT blindly — always check the shape of your data first with a histogram or Q-Q plot.
The CLT for Proportions
A special and widely used case applies to binary (Bernoulli) data:
ThCLT for Sample Proportions
If are i.i.d., then satisfies:
Equivalently: for large .
This is the basis of confidence intervals for proportions and z-tests for proportions.
The CLT for Sums
CLT for the Sample Sum
Here,
- =Sum of n i.i.d. random variables
- =Expected value of the sum
- =Variance of the sum
The standard deviation of grows as — this is why measurement precision improves with , not .
Why the CLT Fails Without Finite Variance
Heavy-Tailed Distributions
If the population has infinite variance (e.g., the Cauchy distribution, or Pareto with ), the CLT does not apply. Instead, the sum converges to a stable distribution (not the normal). For the Cauchy distribution, has exactly the same distribution as — the sample mean never becomes more precise!
This is why checking the assumptions (especially finite variance) before applying the CLT is essential.
Worked Example: Dice Rolls
Consider rolling a fair die times. Each roll has and . The CLT predicts:
For rolls:
Despite the uniform (non-normal) population, the CLT approximation is excellent for .
Key Takeaways
Summary: Central Limit Theorem
- Sample means converge to normal: regardless of population shape
- Variance of is : precision improves with sample size
- Berry–Esseen bound: convergence rate is , not instant
- Minimum for most applications; larger for skewed data
- Requires finite variance: fails for Cauchy, Pareto(), and other heavy-tailed distributions
- Justifies all normal-based inference: z-tests, t-tests, ANOVA, confidence intervals
- The reason the normal distribution is ubiquitous: many small independent effects accumulate to produce approximately normal totals