🎉 75% of content is free forever — Unlock Premium from $10/mo →
CW
Search courses…
💼 Servicesℹ️ About✉️ ContactView Pricing Plansfrom $10

Central Limit Theorem — The Most Important Theorem in Statistics

Foundations of StatisticsSampling Distributions🟢 Free Lesson

Advertisement

Central Limit Theorem — The Most Important Theorem in Statistics

Foundations of Statistics

The Theorem That Made Statistics Possible

The CLT explains why the normal distribution appears universally in nature and measurement, justifying nearly all parametric statistical methods. It guarantees that sample means become normal regardless of the underlying distribution.

  • Scientific Research — Justifies using t-tests and confidence intervals for means
  • Machine Learning — Enables statistical guarantees for ensemble methods and bagging
  • Quality Engineering — Underpins statistical process control and Six Sigma

The CLT is the reason statistics works in practice.


The Central Limit Theorem (CLT) is arguably the single most important result in probability and statistics. It explains why the normal distribution appears universally in nature, measurement, and inference.


The Theorem

ThCentral Limit Theorem (Lindeberg–Lévy)

Let X1,X2,,XnX_1, X_2, \ldots, X_n be independent and identically distributed random variables with E[Xi]=μE[X_i] = \mu and Var(Xi)=σ2<\text{Var}(X_i) = \sigma^2 < \infty. Then as nn \to \infty:

Xˉnμσ/ndZN(0,1)\frac{\bar{X}_n - \mu}{\sigma / \sqrt{n}} \xrightarrow{d} Z \sim \mathcal{N}(0, 1)

where Xˉn=1ni=1nXi\bar{X}_n = \frac{1}{n}\sum_{i=1}^n X_i is the sample mean.

In words: the standardized sample mean converges in distribution to the standard normal, regardless of the shape of the population distribution.

Convergence in Distribution

The notation d\xrightarrow{d} means convergence in distribution: the CDF of the standardized statistic converges pointwise to Φ(z)\Phi(z) at every continuity point zz. This does not require the underlying variables to be normally distributed — only that they have finite mean and variance.


The CLT Approximation

For large nn, the CLT gives the approximation:

CLT Approximation for the Sample Mean

Xˉn  ˙  N(μ,  σ2n)\bar{X}_n \;\dot{\sim}\; \mathcal{N}\left(\mu, \; \frac{\sigma^2}{n}\right)

Here,

  • Xˉn\bar{X}_n=Sample mean of n i.i.d. observations
  • μ\mu=Population mean
  • σ2\sigma^2=Population variance
  • nn=Sample size

The variance of the sample mean is σ2/n\sigma^2/n — it decreases with sample size, which is why larger samples give more precise estimates.


Rate of Convergence: The Berry–Esseen Theorem

ThBerry–Esseen Bound

The CLT convergence has a quantitative rate. Under the Lindeberg–Lévy conditions with ρ=E[Xiμ3]\rho = E[|X_i - \mu|^3]:

supzRP(Xˉnμσ/nz)Φ(z)Cρσ3n\sup_{z \in \mathbb{R}} \left| P\left(\frac{\bar{X}_n - \mu}{\sigma/\sqrt{n}} \leq z\right) - \Phi(z) \right| \leq \frac{C \cdot \rho}{\sigma^3 \sqrt{n}}

where C0.4748C \leq 0.4748 (Shevtsova, 2011). The error decreases as O(1/n)O(1/\sqrt{n}).

This bound tells us:

  • Convergence is not instant — the error is O(1/n)O(1/\sqrt{n})
  • For n=100n = 100, the maximum error is roughly 0.050.05 (if ρ/σ31\rho/\sigma^3 \approx 1)
  • Skewed or heavy-tailed distributions converge more slowly

Minimum Sample Size Guidelines

When is $n$ Large Enough?

The required nn depends on the population shape:

Population ShapeMinimum nn
Symmetric, light-tailedn15n \geq 15
Moderately skewedn30n \geq 30
Heavily skewedn50n \geq 50
Extremely skewed (e.g., exponential)n100n \geq 100
For proportions: np10np \geq 10 and n(1p)10n(1-p) \geq 10

Never apply the CLT blindly — always check the shape of your data first with a histogram or Q-Q plot.


The CLT for Proportions

A special and widely used case applies to binary (Bernoulli) data:

ThCLT for Sample Proportions

If XiBernoulli(p)X_i \sim \text{Bernoulli}(p) are i.i.d., then p^=1ni=1nXi\hat{p} = \frac{1}{n}\sum_{i=1}^n X_i satisfies:

p^pp(1p)/ndN(0,1)\frac{\hat{p} - p}{\sqrt{p(1-p)/n}} \xrightarrow{d} \mathcal{N}(0, 1)

Equivalently: p^  ˙  N(p,  p(1p)/n)\hat{p} \;\dot{\sim}\; \mathcal{N}(p, \; p(1-p)/n) for large nn.

This is the basis of confidence intervals for proportions and z-tests for proportions.


The CLT for Sums

CLT for the Sample Sum

Sn=i=1nXi  ˙  N(nμ,  nσ2)S_n = \sum_{i=1}^n X_i \;\dot{\sim}\; \mathcal{N}(n\mu, \; n\sigma^2)

Here,

  • SnS_n=Sum of n i.i.d. random variables
  • nμn\mu=Expected value of the sum
  • nσ2n\sigma^2=Variance of the sum

The standard deviation of SnS_n grows as n\sqrt{n} — this is why measurement precision improves with n\sqrt{n}, not nn.


Why the CLT Fails Without Finite Variance

Heavy-Tailed Distributions

If the population has infinite variance (e.g., the Cauchy distribution, or Pareto with α2\alpha \leq 2), the CLT does not apply. Instead, the sum converges to a stable distribution (not the normal). For the Cauchy distribution, Xˉn\bar{X}_n has exactly the same distribution as X1X_1 — the sample mean never becomes more precise!

This is why checking the assumptions (especially finite variance) before applying the CLT is essential.


Worked Example: Dice Rolls

Consider rolling a fair die nn times. Each roll has μ=3.5\mu = 3.5 and σ2=35/122.917\sigma^2 = 35/12 \approx 2.917. The CLT predicts:

Xˉn  ˙  N(3.5,  2.917/n)\bar{X}_n \;\dot{\sim}\; \mathcal{N}(3.5, \; 2.917/n)

For n=36n = 36 rolls:

P(3.0Xˉ4.0)=P(3.03.52.917/36Z4.03.52.917/36)P(3.0 \leq \bar{X} \leq 4.0) = P\left(\frac{3.0 - 3.5}{\sqrt{2.917/36}} \leq Z \leq \frac{4.0 - 3.5}{\sqrt{2.917/36}}\right)
=P(1.98Z1.98)=2Φ(1.98)10.952= P(-1.98 \leq Z \leq 1.98) = 2\Phi(1.98) - 1 \approx 0.952

Despite the uniform (non-normal) population, the CLT approximation is excellent for n=36n = 36.


Key Takeaways

Summary: Central Limit Theorem

  • Sample means converge to normal: Xˉμσ/ndN(0,1)\frac{\bar{X} - \mu}{\sigma/\sqrt{n}} \xrightarrow{d} \mathcal{N}(0,1) regardless of population shape
  • Variance of Xˉ\bar{X} is σ2/n\sigma^2/n: precision improves with sample size
  • Berry–Esseen bound: convergence rate is O(1/n)O(1/\sqrt{n}), not instant
  • Minimum n30n \geq 30 for most applications; larger for skewed data
  • Requires finite variance: fails for Cauchy, Pareto(α2\alpha \leq 2), and other heavy-tailed distributions
  • Justifies all normal-based inference: z-tests, t-tests, ANOVA, confidence intervals
  • The reason the normal distribution is ubiquitous: many small independent effects accumulate to produce approximately normal totals

Premium Content

Central Limit Theorem — The Most Important Theorem in Statistics

Unlock this lesson and 900+ advanced tutorials with a Premium plan.

🎯End-to-end Projects
💼Interview Prep
📜Certificates
🤝Community Access

Already a member? Log in

Need Expert Statistics Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement