Central Limit Theorem — The Most Important Theorem in Statistics

Foundations of Statistics

The Theorem That Made Statistics Possible

The CLT explains why the normal distribution appears universally in nature and measurement, justifying nearly all parametric statistical methods. It guarantees that sample means become normal regardless of the underlying distribution.

Scientific Research — Justifies using t-tests and confidence intervals for means
Machine Learning — Enables statistical guarantees for ensemble methods and bagging
Quality Engineering — Underpins statistical process control and Six Sigma

The CLT is the reason statistics works in practice.

The Central Limit Theorem (CLT) is arguably the single most important result in probability and statistics. It explains why the normal distribution appears universally in nature, measurement, and inference.

The Theorem

ThCentral Limit Theorem (Lindeberg–Lévy)

Let $X_1, X_2, \ldots, X_n$ be independent and identically distributed random variables with $E[X_i] = \mu$ and $\text{Var}(X_i) = \sigma^2 < \infty$ . Then as $n \to \infty$ :

\frac{\bar{X}_n - \mu}{\sigma / \sqrt{n}} \xrightarrow{d} Z \sim \mathcal{N}(0, 1)

where $\bar{X}_n = \frac{1}{n}\sum_{i=1}^n X_i$ is the sample mean.

In words: the standardized sample mean converges in distribution to the standard normal, regardless of the shape of the population distribution.

Convergence in Distribution

The notation $\xrightarrow{d}$ means convergence in distribution: the CDF of the standardized statistic converges pointwise to $\Phi(z)$ at every continuity point $z$ . This does not require the underlying variables to be normally distributed — only that they have finite mean and variance.

The CLT Approximation

For large $n$ , the CLT gives the approximation:

CLT Approximation for the Sample Mean

\bar{X}_n \;\dot{\sim}\; \mathcal{N}\left(\mu, \; \frac{\sigma^2}{n}\right)

Here,

$\bar{X}_n$ =Sample mean of n i.i.d. observations
$\mu$ =Population mean
$\sigma^2$ =Population variance
$n$ =Sample size

The variance of the sample mean is $\sigma^2/n$ — it decreases with sample size, which is why larger samples give more precise estimates.

Rate of Convergence: The Berry–Esseen Theorem

ThBerry–Esseen Bound

The CLT convergence has a quantitative rate. Under the Lindeberg–Lévy conditions with $\rho = E[|X_i - \mu|^3]$ :

\sup_{z \in \mathbb{R}} \left| P\left(\frac{\bar{X}_n - \mu}{\sigma/\sqrt{n}} \leq z\right) - \Phi(z) \right| \leq \frac{C \cdot \rho}{\sigma^3 \sqrt{n}}

where $C \leq 0.4748$ (Shevtsova, 2011). The error decreases as $O(1/\sqrt{n})$ .

This bound tells us:

Convergence is not instant — the error is $O(1/\sqrt{n})$
For $n = 100$ , the maximum error is roughly $0.05$ (if $\rho/\sigma^3 \approx 1$ )
Skewed or heavy-tailed distributions converge more slowly

Minimum Sample Size Guidelines

When is $n$ Large Enough?

The required $n$ depends on the population shape:

Population Shape	Minimum $n$
Symmetric, light-tailed	$n \geq 15$
Moderately skewed	$n \geq 30$
Heavily skewed	$n \geq 50$
Extremely skewed (e.g., exponential)	$n \geq 100$
For proportions: $np \geq 10$ and $n(1-p) \geq 10$	—

Never apply the CLT blindly — always check the shape of your data first with a histogram or Q-Q plot.

The CLT for Proportions

A special and widely used case applies to binary (Bernoulli) data:

ThCLT for Sample Proportions

If $X_i \sim \text{Bernoulli}(p)$ are i.i.d., then $\hat{p} = \frac{1}{n}\sum_{i=1}^n X_i$ satisfies:

\frac{\hat{p} - p}{\sqrt{p(1-p)/n}} \xrightarrow{d} \mathcal{N}(0, 1)

Equivalently: $\hat{p} \;\dot{\sim}\; \mathcal{N}(p, \; p(1-p)/n)$ for large $n$ .

This is the basis of confidence intervals for proportions and z-tests for proportions.

The CLT for Sums

CLT for the Sample Sum

S_n = \sum_{i=1}^n X_i \;\dot{\sim}\; \mathcal{N}(n\mu, \; n\sigma^2)

Here,

$S_n$ =Sum of n i.i.d. random variables
$n\mu$ =Expected value of the sum
$n\sigma^2$ =Variance of the sum

The standard deviation of $S_n$ grows as $\sqrt{n}$ — this is why measurement precision improves with $\sqrt{n}$ , not $n$ .

Why the CLT Fails Without Finite Variance

Heavy-Tailed Distributions

If the population has infinite variance (e.g., the Cauchy distribution, or Pareto with $\alpha \leq 2$ ), the CLT does not apply. Instead, the sum converges to a stable distribution (not the normal). For the Cauchy distribution, $\bar{X}_n$ has exactly the same distribution as $X_1$ — the sample mean never becomes more precise!

This is why checking the assumptions (especially finite variance) before applying the CLT is essential.

Worked Example: Dice Rolls

Consider rolling a fair die $n$ times. Each roll has $\mu = 3.5$ and $\sigma^2 = 35/12 \approx 2.917$ . The CLT predicts:

\bar{X}_n \;\dot{\sim}\; \mathcal{N}(3.5, \; 2.917/n)

For $n = 36$ rolls:

P(3.0 \leq \bar{X} \leq 4.0) = P\left(\frac{3.0 - 3.5}{\sqrt{2.917/36}} \leq Z \leq \frac{4.0 - 3.5}{\sqrt{2.917/36}}\right)

= P(-1.98 \leq Z \leq 1.98) = 2\Phi(1.98) - 1 \approx 0.952

Despite the uniform (non-normal) population, the CLT approximation is excellent for $n = 36$ .

Key Takeaways

Summary: Central Limit Theorem

Sample means converge to normal: $\frac{\bar{X} - \mu}{\sigma/\sqrt{n}} \xrightarrow{d} \mathcal{N}(0,1)$ regardless of population shape
Variance of $\bar{X}$ is $\sigma^2/n$ : precision improves with sample size
Berry–Esseen bound: convergence rate is $O(1/\sqrt{n})$ , not instant
Minimum $n \geq 30$ for most applications; larger for skewed data
Requires finite variance: fails for Cauchy, Pareto( $\alpha \leq 2$ ), and other heavy-tailed distributions
Justifies all normal-based inference: z-tests, t-tests, ANOVA, confidence intervals
The reason the normal distribution is ubiquitous: many small independent effects accumulate to produce approximately normal totals

Central Limit Theorem — The Most Important Theorem in Statistics

Central Limit Theorem — The Most Important Theorem in Statistics

The Theorem That Made Statistics Possible

The Theorem

ThCentral Limit Theorem (Lindeberg–Lévy)

The CLT Approximation

CLT Approximation for the Sample Mean

Rate of Convergence: The Berry–Esseen Theorem

ThBerry–Esseen Bound

Minimum Sample Size Guidelines

The CLT for Proportions

ThCLT for Sample Proportions

The CLT for Sums

CLT for the Sample Sum

Why the CLT Fails Without Finite Variance

Worked Example: Dice Rolls

Key Takeaways

Summary: Central Limit Theorem

Premium Content

Need Expert Statistics Help?