🎉 75% of content is free forever — Unlock Premium from $10/mo →
CW
Search courses…
💼 Servicesℹ️ About✉️ ContactView Pricing Plansfrom $10

Standard Deviation — Formula, Empirical Rule, and Coefficient of Variation

Foundations of StatisticsDescriptive Statistics🟢 Free Lesson

Advertisement

Standard Deviation

Descriptive Statistics

How Far Are Data Points From the Mean?

Standard deviation translates variance back into the original units of your data — making spread actually meaningful.

Understanding standard deviation helps you:

  • Interpret data — compare observations directly to the mean in real units
  • Apply the empirical rule — know what percentage of data falls within 1, 2, or 3 standard deviations
  • Detect outliers — flag unusual observations with z-scores
  • Compare variability — use the coefficient of variation across different scales

If variance is the theory, standard deviation is the practice.


What is Standard Deviation?

Definition

The standard deviation is the square root of variance. It returns the spread to the original units of the data, making it directly interpretable as a measure of typical distance from the mean.

Population Standard Deviation

σ=1Ni=1N(xiμ)2\sigma = \sqrt{\frac{1}{N}\sum_{i=1}^N (x_i - \mu)^2}

Here,

  • σ\sigma=Population standard deviation
  • μ\mu=Population mean
  • NN=Population size

Sample Standard Deviation

s=1n1i=1n(xixˉ)2s = \sqrt{\frac{1}{n-1}\sum_{i=1}^n (x_i - \bar{x})^2}

Here,

  • ss=Sample standard deviation
  • xˉ\bar{x}=Sample mean
  • n1n-1=Degrees of freedom (Bessel's correction)

Why Square Root?

Units and Interpretability

Variance has squared units (e.g., dollars2\text{dollars}^2), making it hard to interpret directly. The standard deviation has the same units as the original data, so it can be compared directly to the mean and to individual observations. For example, if exam scores have mean 75 and standard deviation 10, then a score of 85 is exactly one standard deviation above the mean.


The Empirical Rule (68-95-99.7)

ThEmpirical Rule for Normal Distributions

For XN(μ,σ2)X \sim \mathcal{N}(\mu, \sigma^2):

RangeExact ProbabilityApproximation
μ±1σ\mu \pm 1\sigma2Φ(1)1=0.68272\Phi(1) - 1 = 0.6827≈ 68%
μ±2σ\mu \pm 2\sigma2Φ(2)1=0.95452\Phi(2) - 1 = 0.9545≈ 95%
μ±3σ\mu \pm 3\sigma2Φ(3)1=0.99732\Phi(3) - 1 = 0.9973≈ 99.7%

This rule is the foundation of outlier detection: observations beyond ±3σ\pm 3\sigma are extremely rare (about 0.3%) under normality.


Standardized Scores (Z-Scores)

Z-Score

zi=xixˉsz_i = \frac{x_i - \bar{x}}{s}

Here,

  • ziz_i=Standardized value of xᵢ
  • xˉ\bar{x}=Sample mean
  • ss=Sample standard deviation

The z-score tells you how many standard deviations an observation is from the mean. It is dimensionless and enables comparison across different scales.


Coefficient of Variation (CV)

Coefficient of Variation

CV=sxˉ×100%CV = \frac{s}{\bar{x}} \times 100\%

Here,

  • ss=Standard deviation
  • xˉ\bar{x}=Mean

The CV enables comparison of variability across datasets with different units or vastly different means. A lower CV indicates less relative variability.


Chebyshev's Inequality

ThChebyshev's Inequality

For any distribution (not just normal) with finite mean μ\mu and variance σ2\sigma^2:

P(Xμkσ)1k2P(|X - \mu| \geq k\sigma) \leq \frac{1}{k^2}

This holds for all k>0k > 0, but it is only useful for k>1k > 1 (since 1/k2>11/k^2 > 1 for k<1k < 1).

kkUpper bound on P(Xμkσ)P(|X-\mu| \geq k\sigma)
225%\leq 25\%
311.1%\leq 11.1\%
46.25%\leq 6.25\%
54%\leq 4\%

Chebyshev vs. Empirical Rule

Chebyshev's inequality gives a universal upper bound that holds for any distribution. The empirical rule gives exact (approximate) percentages for normal distributions. For a normal distribution, P(Xμ2σ)4.55%P(|X-\mu| \geq 2\sigma) \approx 4.55\%, much less than the Chebyshev bound of 25%.


Relationship to Other Measures

MeasureFormulaUnitsRobust?
Variance σ2\sigma^21N(xiμ)2\frac{1}{N}\sum(x_i - \mu)^2Squared unitsNo
Standard deviation σ\sigmaσ2\sqrt{\sigma^2}Original unitsNo
IQRQ3Q1Q_3 - Q_1Original unitsYes
MADmedian(Xmedian(X))\text{median}(|X - \text{median}(X)|)Original unitsYes
Rangemaxmin\max - \minOriginal unitsNo

Standard Deviation in Machine Learning

ML ApplicationStd Dev UsageWhy
StandardScalerx_scaled = (x - μ) / σNeural networks train faster
Confidence intervalsμ ± 1.96σ/√nModel uncertainty quantification
Weight initializationXavier/Glorot: std = √(2/n)Prevents vanishing/exploding gradients
Batch normalizationNormalize to mean=0, std=1Stabilizes deep learning
Anomaly detectionz
import numpy as np
from sklearn.preprocessing import StandardScaler

np.random.seed(42)

# StandardScaler uses std dev
data = np.random.randn(100, 3) * [10, 1, 100]
scaler = StandardScaler()
scaled = scaler.fit_transform(data)
print(f"Original std: {data.std(axis=0).round(1)}")
print(f"Scaled std:   {scaled.std(axis=0).round(3)}")

# Weight initialization (Xavier)
layer_sizes = [784, 256, 128, 10]
for i in range(len(layer_sizes)-1):
    fan_in, fan_out = layer_sizes[i], layer_sizes[i+1]
    std_xavier = np.sqrt(2.0 / (fan_in + fan_out))
    weights = np.random.randn(fan_in, fan_out) * std_xavier
    print(f"Layer {i}: {fan_in}→{fan_out}, std={std_xavier:.4f}, "
          f"weight range=[{weights.min():.3f}, {weights.max():.3f}]")

Key Takeaways

Standard deviation is in the same units as the data — directly interpretable as typical distance from the mean

68-95-99.7 rule applies only to normal distributions — never apply it blindly

CV = SD/mean allows variability comparison across different scales and units

Chebyshev's inequality provides universal bounds for any distribution: P(|X−μ| ≥ kσ) ≤ 1/k²

"Standard deviation is the measuring stick of uncertainty — without it, you are guessing in the dark."

Premium Content

Standard Deviation — Formula, Empirical Rule, and Coefficient of Variation

Unlock this lesson and 900+ advanced tutorials with a Premium plan.

🎯End-to-end Projects
💼Interview Prep
📜Certificates
🤝Community Access

Already a member? Log in

Need Expert Statistics Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement