Standard Deviation

Descriptive Statistics

How Far Are Data Points From the Mean?

Standard deviation translates variance back into the original units of your data — making spread actually meaningful.

Understanding standard deviation helps you:

Interpret data — compare observations directly to the mean in real units
Apply the empirical rule — know what percentage of data falls within 1, 2, or 3 standard deviations
Detect outliers — flag unusual observations with z-scores
Compare variability — use the coefficient of variation across different scales

If variance is the theory, standard deviation is the practice.

What is Standard Deviation?

Definition

The standard deviation is the square root of variance. It returns the spread to the original units of the data, making it directly interpretable as a measure of typical distance from the mean.

Population Standard Deviation

\sigma = \sqrt{\frac{1}{N}\sum_{i=1}^N (x_i - \mu)^2}

Here,

$\sigma$ =Population standard deviation
$\mu$ =Population mean
$N$ =Population size

Sample Standard Deviation

s = \sqrt{\frac{1}{n-1}\sum_{i=1}^n (x_i - \bar{x})^2}

Here,

$s$ =Sample standard deviation
$\bar{x}$ =Sample mean
$n-1$ =Degrees of freedom (Bessel's correction)

Why Square Root?

Units and Interpretability

Variance has squared units (e.g., $\text{dollars}^2$ ), making it hard to interpret directly. The standard deviation has the same units as the original data, so it can be compared directly to the mean and to individual observations. For example, if exam scores have mean 75 and standard deviation 10, then a score of 85 is exactly one standard deviation above the mean.

The Empirical Rule (68-95-99.7)

ThEmpirical Rule for Normal Distributions

For $X \sim \mathcal{N}(\mu, \sigma^2)$ :

Range	Exact Probability	Approximation
$\mu \pm 1\sigma$	$2\Phi(1) - 1 = 0.6827$	≈ 68%
$\mu \pm 2\sigma$	$2\Phi(2) - 1 = 0.9545$	≈ 95%
$\mu \pm 3\sigma$	$2\Phi(3) - 1 = 0.9973$	≈ 99.7%

This rule is the foundation of outlier detection: observations beyond $\pm 3\sigma$ are extremely rare (about 0.3%) under normality.

Standardized Scores (Z-Scores)

Z-Score

z_i = \frac{x_i - \bar{x}}{s}

Here,

$z_i$ =Standardized value of xᵢ
$\bar{x}$ =Sample mean
$s$ =Sample standard deviation

The z-score tells you how many standard deviations an observation is from the mean. It is dimensionless and enables comparison across different scales.

Coefficient of Variation (CV)

Coefficient of Variation

CV = \frac{s}{\bar{x}} \times 100\%

Here,

$s$ =Standard deviation
$\bar{x}$ =Mean

The CV enables comparison of variability across datasets with different units or vastly different means. A lower CV indicates less relative variability.

Chebyshev's Inequality

ThChebyshev's Inequality

For any distribution (not just normal) with finite mean $\mu$ and variance $\sigma^2$ :

P(|X - \mu| \geq k\sigma) \leq \frac{1}{k^2}

This holds for all $k > 0$ , but it is only useful for $k > 1$ (since $1/k^2 > 1$ for $k < 1$ ).

$k$	Upper bound on $P(\|X-\mu\| \geq k\sigma)$
2	$\leq 25\%$
3	$\leq 11.1\%$
4	$\leq 6.25\%$
5	$\leq 4\%$

Chebyshev vs. Empirical Rule

Chebyshev's inequality gives a universal upper bound that holds for any distribution. The empirical rule gives exact (approximate) percentages for normal distributions. For a normal distribution, $P(|X-\mu| \geq 2\sigma) \approx 4.55\%$ , much less than the Chebyshev bound of 25%.

Relationship to Other Measures

Measure	Formula	Units	Robust?
Variance $\sigma^2$	$\frac{1}{N}\sum(x_i - \mu)^2$	Squared units	No
Standard deviation $\sigma$	$\sqrt{\sigma^2}$	Original units	No
IQR	$Q_3 - Q_1$	Original units	Yes
MAD	$\text{median}(\|X - \text{median}(X)\|)$	Original units	Yes
Range	$\max - \min$	Original units	No