Variance of a Random Variable
Probability Theory
Measuring Spread — How Far Values Deviate from the Mean
Variance quantifies the average squared deviation of a random variable from its mean. It is the single most important measure of dispersion in all of statistics.
- Foundation — variance underpins standard deviation, covariance, correlation, and every statistical test
- Chebyshev's inequality — bounds tail probabilities using only mean and variance
- Portfolio theory — in finance, variance equals risk; investors minimize variance
- Quality control — Six Sigma reduces process variance to achieve near-perfection
Without variance, we cannot quantify uncertainty — and without uncertainty, statistics has no purpose.
What is Variance?
Definition
Variance is the expected squared deviation of a random variable from its mean. It measures the average spread of the distribution around its center.
"The variance is the moment of inertia of the probability distribution about its center of mass." — Persi Diaconis
Mathematical Formulation
Definition of Variance
Here,
- =Random variable
- =Mean (expected value) of X
- =Squared deviation from the mean
Computational Formula
Here,
- =Second raw moment of X
- =Square of the first moment
Derivation of the Computational Formula
Expand the definition directly:
This identity is essential for computation: you only need the first two raw moments.
Properties of Variance
ThProperties of Variance
Let be a random variable with , and let be constants. Then:
(i)
(ii) if and only if is a.s. constant
(iii) If and are independent,
Proof Sketch of (i)
The shift cancels inside the squared deviation — translation does not affect spread.
Proof Sketch of (iii)
Expand: (using independence, ). Subtracting yields .
Non-Independence Case
For dependent :
Independence implies , but the converse is false: zero covariance does not imply independence.
Standard Deviation
The standard deviation restores the original scale of measurement, making it interpretable in the same units as .
Standard Deviation
Here,
- =Standard deviation of X
- =Variance (square of standard deviation)
Chebyshev's Inequality
ThChebyshev's Inequality
For any random variable with finite mean and variance , and for any :
Proof Sketch
Let . Then . Apply Markov's inequality to with threshold :
But is equivalent to .
This inequality is remarkably general — it requires no assumption about the shape of the distribution. For , it says at most of the probability mass lies beyond 2 standard deviations from the mean.
| Maximum beyond | Practical Meaning | |
|---|---|---|
| 1 | 100% | Trivial bound |
| 2 | 25% | At least 75% within 2 SD |
| 3 | 11.1% | At least 89% within 3 SD |
| 4 | 6.25% | At least 94% within 4 SD |
| 5 | 4% | At least 96% within 5 SD |
Worked Example: Discrete Random Variable
Example: Finding Variance from a PMF
Let have PMF: , , .
Step 1: Compute :
Step 2: Compute :
Step 3: Apply the computational formula:
Step 4: Standard deviation: .
Verification via definition: ✓
Worked Example: Continuous Random Variable
Example: Variance of Uniform(a,b)
Let with for .
Step 1: .
Step 2:
Step 3:
This elegant result shows variance depends only on the width , not the location — consistent with the translation invariance property.
Worked Example: Real Data — Exam Scores
Example: Variance of Exam Scores
A class of 10 students scored: .
Step 1: Compute the mean:
Step 2: Compute squared deviations:
| Score | ||
|---|---|---|
| 72 | -9.4 | 88.36 |
| 85 | 3.6 | 12.96 |
| 91 | 9.6 | 92.16 |
| 68 | -13.4 | 179.56 |
| 78 | -3.4 | 11.56 |
| 94 | 12.6 | 158.76 |
| 82 | 0.6 | 0.36 |
| 76 | -5.4 | 29.16 |
| 88 | 6.6 | 43.56 |
| 80 | -1.4 | 1.96 |
Step 3: Compute variance (sample, using ):
Step 4: Standard deviation:
Interpretation: Scores typically deviate about 8.3 points from the class average of 81.4.
Python Implementation
import numpy as np
from scipy import stats
np.random.seed(42)
# Demonstrate variance properties with a Bernoulli(p) random variable
p = 0.6
n = 10000
samples = np.random.binomial(1, p, size=n)
# Empirical variance vs theoretical
empirical_var = np.var(samples, ddof=0)
theoretical_var = p * (1 - p)
print(f"Bernoulli(p={p}): empirical Var = {empirical_var:.4f}, theoretical Var = {theoretical_var:.4f}")
# Show Var(aX + b) = a^2 Var(X)
a, b_const = 3, 5
transformed = a * samples + b_const
print(f"Var({a}X + {b_const}) = {np.var(transformed, ddof=0):.4f}")
print(f"{a}^2 * Var(X) = {a**2 * empirical_var:.4f}")
# Sum of independent RVs: Var(X+Y) = Var(X) + Var(Y)
samples_y = np.random.binomial(1, 0.3, size=n)
sum_var = np.var(samples + samples_y, ddof=0)
print(f"Var(X+Y) = {sum_var:.4f}")
print(f"Var(X) + Var(Y) = {np.var(samples, ddof=0) + np.var(samples_y, ddof=0):.4f}")
Python Implementation: Chebyshev Verification
import numpy as np
np.random.seed(42)
# Use an exponential distribution (skewed, not normal) to test Chebyshev
lam = 1.0
n = 100000
samples = np.random.exponential(1/lam, size=n)
mu = np.mean(samples)
sigma = np.std(samples)
# Empirical P(|X - mu| >= k*sigma) vs Chebyshev bound 1/k^2
for k in [1.5, 2, 3, 4]:
empirical = np.mean(np.abs(samples - mu) >= k * sigma)
bound = 1 / k**2
print(f"k={k}: empirical P = {empirical:.4f}, Chebyshev bound = {bound:.4f}")
Python Implementation: Real Data Example
import numpy as np
# Exam scores from worked example
scores = np.array([72, 85, 91, 68, 78, 94, 82, 76, 88, 80])
# Population variance (divide by n)
pop_var = np.var(scores)
# Sample variance (divide by n-1)
sample_var = np.var(scores, ddof=1)
print(f"Mean: {np.mean(scores):.1f}")
print(f"Population variance: {pop_var:.2f}")
print(f"Sample variance: {sample_var:.2f}")
print(f"Standard deviation: {np.std(scores, ddof=1):.2f}")
# Manual computation for verification
mean = np.mean(scores)
manual_var = np.sum((scores - mean)**2) / (len(scores) - 1)
print(f"\nManual computation: {manual_var:.2f}")
Variance of Common Distributions
Reference Table
| Distribution | PMF/PDF | ||
|---|---|---|---|
| Bernoulli | |||
| Binomial | |||
| Geometric | |||
| Poisson | |||
| Uniform | |||
| Normal |
Variance in Machine Learning
| ML Application | Variance Usage | Why It Matters |
|---|---|---|
| Bias-variance tradeoff | Variance of model predictions | High variance = overfitting |
| Feature selection | Variance threshold | Remove low-variance features |
| Ensemble methods | Reduce variance via averaging | Bagging, random forests |
| Regularization | Penalize high-variance coefficients | Ridge, Lasso regression |
Key Takeaways
Variance measures spread:
Translation invariant: ; scale equivariant:
Independence additivity: when
is a.s. constant
Chebyshev's inequality bounds tail probabilities using only and
Standard deviation returns to original units; variance is in squared units
"Variance is the price we pay for uncertainty." — Harry Markowitz