The Empirical Rule — 68-95-99.7 for Normal Distributions
Foundations of Statistics
The 68-95-99.7 Rule Every Analyst Needs
The empirical rule provides instant intuition about data spread in normal distributions, enabling quick assessments without complex calculations. This simple framework is the foundation for outlier detection, quality control, and rapid data analysis.
- Quality Assurance — Six Sigma methodologies use the rule to identify process deviations
- Risk Management — Financial institutions apply it to estimate VaR and expected loss ranges
- Clinical Research — Researchers quickly assess whether patient measurements fall within expected ranges
Three numbers that capture the essence of normal variability.
Core Concepts
The empirical rule provides exact percentages for how probability mass concentrates around the mean in a normal distribution. It is a direct consequence of the Gaussian pdf's structure.
DfEmpirical Rule
For a normal distribution :
- Approximately 68.27% of data falls within
- Approximately 95.45% of data falls within
- Approximately 99.73% of data falls within
Empirical Rule (Exact Form)
Here,
- =Mean of the distribution
- =Standard deviation
- =Number of standard deviations from mean
- =Standard normal CDF evaluated at k
Rigorous Derivation
ThProof of the 68-95-99.7 Rule
For , standardize: .
By symmetry of the standard normal, :
Numerical evaluation:
- :
- :
- :
Connection to the Error Function
The normal CDF can be written using the error function:
So the empirical rule becomes: . For : .
Chebyshev's Inequality (The Universal Bound)
Chebyshev's Inequality
Here,
- =Number of standard deviations (k > 1)
- =Mean and std dev of X
ThProof of Chebyshev's Inequality
Let . We want .
By Markov's inequality applied to (since ):
Since , we have .
Empirical Rule vs. Chebyshev
The empirical rule gives tight bounds for normal data, while Chebyshev gives loose but universal bounds for any distribution. For :
- Normal: (exact)
- Chebyshev: (guaranteed for any distribution)
The gap between 75% and 95.45% shows how much structure the normal distribution provides.
Higher-Order Concentration: The 6σ Rule
Six Sigma Quality Control
In Six Sigma methodology, "6σ" means . For a normal distribution:
This is approximately 2 defects per billion opportunities — the gold standard for manufacturing quality.
Generalization: Higher-Order Concentration Inequalations
ThVysochanskii-Petunin Inequality (Unimodal Case)
For a unimodal distribution with mean and finite variance , and :
This is tighter than Chebyshev's bound for unimodal distributions, but still looser than the empirical rule for normals.
Worked Example
Example: Exam Scores
Suppose exam scores are (mean 75, variance 100, so ).
Q: What fraction of students score between 65 and 85?
So about 68.3% of students score within one standard deviation of the mean.
Q: What fraction score above 95?
About 2.3% score above 95.
Relationship to the Normal Distribution Family
The empirical rule is a special property of the Gaussian family. Other distributions have their own concentration behavior:
- Laplace ():
- Uniform (): — exactly 100% for
- Cauchy: No finite variance exists, so the empirical rule doesn't apply at all
Specific Applications
- Six Sigma manufacturing — Defect rates are computed using from the empirical rule.
- Process capability indices — and are defined in terms of tolerance limits.
- Outlier detection — Values beyond are flagged as potential outliers (0.27% false positive rate for normal data).
- Standardized testing — IQ scores (): 68% score between 85–115, 95% between 70–130.
Key Takeaways
Summary: Empirical Rule
- 68.27% within , 95.45% within , 99.73% within
- Exact formula:
- Applies only to normal distributions; for arbitrary distributions use Chebyshev () or Vysochanskii-Petunin ( for unimodal)
- Six Sigma: defects per opportunity
- Foundation for outlier detection, process control, and standardized testing