🎉 75% of content is free forever — Unlock Premium from $10/mo →
CW
Search courses…
💼 Servicesℹ️ About✉️ ContactView Pricing Plansfrom $10

Variance of a Random Variable — Formula and Properties

Foundations of StatisticsProbability Theory🟢 Free Lesson

Advertisement

Variance of a Random Variable

Probability Theory

Measuring Spread — How Far Values Deviate from the Mean

Variance quantifies the average squared deviation of a random variable from its mean. It is the single most important measure of dispersion in all of statistics.

  • Foundation — variance underpins standard deviation, covariance, correlation, and every statistical test
  • Chebyshev's inequality — bounds tail probabilities using only mean and variance
  • Portfolio theory — in finance, variance equals risk; investors minimize variance
  • Quality control — Six Sigma reduces process variance to achieve near-perfection

Without variance, we cannot quantify uncertainty — and without uncertainty, statistics has no purpose.


What is Variance?

Definition

Variance is the expected squared deviation of a random variable from its mean. It measures the average spread of the distribution around its center.

"The variance is the moment of inertia of the probability distribution about its center of mass." — Persi Diaconis


Mathematical Formulation

Definition of Variance

Var(X)=E ⁣[(Xμ)2]where μ=E[X]\text{Var}(X) = E\!\left[(X - \mu)^2\right] \quad \text{where } \mu = E[X]

Here,

  • XX=Random variable
  • μ\mu=Mean (expected value) of X
  • (Xμ)2(X - \mu)^2=Squared deviation from the mean

Computational Formula

Var(X)=E[X2](E[X])2\text{Var}(X) = E[X^2] - (E[X])^2

Here,

  • E[X2]E[X^2]=Second raw moment of X
  • (E[X])2(E[X])^2=Square of the first moment

Derivation of the Computational Formula

Expand the definition directly:

E[(Xμ)2]=E[X22μX+μ2]=E[X2]2μE[X]+μ2=E[X2]2μ2+μ2=E[X2]μ2.E[(X-\mu)^2] = E[X^2 - 2\mu X + \mu^2] = E[X^2] - 2\mu\,E[X] + \mu^2 = E[X^2] - 2\mu^2 + \mu^2 = E[X^2] - \mu^2.

This identity is essential for computation: you only need the first two raw moments.


Properties of Variance

ThProperties of Variance

Let XX be a random variable with Var(X)<\text{Var}(X) < \infty, and let a,ba, b be constants. Then:

(i) Var(aX+b)=a2Var(X)\text{Var}(aX + b) = a^2\,\text{Var}(X)

(ii) Var(X)=0\text{Var}(X) = 0 if and only if XX is a.s. constant

(iii) If XX and YY are independent, Var(X+Y)=Var(X)+Var(Y)\text{Var}(X + Y) = \text{Var}(X) + \text{Var}(Y)

Proof Sketch of (i)

Var(aX+b)=E ⁣[((aX+b)(aμ+b))2]=E ⁣[a2(Xμ)2]=a2Var(X).\text{Var}(aX+b) = E\!\left[((aX+b) - (a\mu+b))^2\right] = E\!\left[a^2(X-\mu)^2\right] = a^2\,\text{Var}(X).

The shift bb cancels inside the squared deviation — translation does not affect spread.

Proof Sketch of (iii)

Var(X+Y)=E[(X+Y)2](E[X+Y])2.\text{Var}(X+Y) = E[(X+Y)^2] - (E[X+Y])^2.

Expand: E[X2+2XY+Y2]=E[X2]+2E[X]E[Y]+E[Y2]E[X^2 + 2XY + Y^2] = E[X^2] + 2E[X]E[Y] + E[Y^2] (using independence, E[XY]=E[X]E[Y]E[XY]=E[X]E[Y]). Subtracting (E[X]+E[Y])2=E[X]2+2E[X]E[Y]+E[Y]2(E[X]+E[Y])^2 = E[X]^2 + 2E[X]E[Y] + E[Y]^2 yields Var(X)+Var(Y)\text{Var}(X)+\text{Var}(Y).

Non-Independence Case

For dependent X,YX, Y:

Var(X+Y)=Var(X)+Var(Y)+2Cov(X,Y).\text{Var}(X+Y) = \text{Var}(X) + \text{Var}(Y) + 2\,\text{Cov}(X,Y).

Independence implies Cov(X,Y)=0\text{Cov}(X,Y)=0, but the converse is false: zero covariance does not imply independence.


Standard Deviation

The standard deviation σ=Var(X)\sigma = \sqrt{\text{Var}(X)} restores the original scale of measurement, making it interpretable in the same units as XX.

Standard Deviation

σ=Var(X),σ2=Var(X)\sigma = \sqrt{\text{Var}(X)}, \quad \sigma^2 = \text{Var}(X)

Here,

  • σ\sigma=Standard deviation of X
  • σ2\sigma^2=Variance (square of standard deviation)

Chebyshev's Inequality

ThChebyshev's Inequality

For any random variable XX with finite mean μ\mu and variance σ2\sigma^2, and for any k>0k > 0:

P ⁣(Xμkσ)1k2.P\!\left(|X - \mu| \geq k\sigma\right) \leq \frac{1}{k^2}.

Proof Sketch

Let Y=(Xμ)2Y = (X-\mu)^2. Then E[Y]=σ2E[Y] = \sigma^2. Apply Markov's inequality to YY with threshold a=(kσ)2a = (k\sigma)^2:

P(Ya)E[Y]a=σ2k2σ2=1k2.P(Y \geq a) \leq \frac{E[Y]}{a} = \frac{\sigma^2}{k^2\sigma^2} = \frac{1}{k^2}.

But YaY \geq a is equivalent to Xμkσ|X-\mu| \geq k\sigma.

This inequality is remarkably general — it requires no assumption about the shape of the distribution. For k=2k=2, it says at most 25%25\% of the probability mass lies beyond 2 standard deviations from the mean.

kkMaximum beyond kσk\sigmaPractical Meaning
1100%Trivial bound
225%At least 75% within 2 SD
311.1%At least 89% within 3 SD
46.25%At least 94% within 4 SD
54%At least 96% within 5 SD

Worked Example: Discrete Random Variable

Example: Finding Variance from a PMF

Let XX have PMF: P(X=1)=0.2P(X=1) = 0.2, P(X=2)=0.5P(X=2) = 0.5, P(X=3)=0.3P(X=3) = 0.3.

Step 1: Compute E[X]E[X]:

E[X]=1(0.2)+2(0.5)+3(0.3)=0.2+1.0+0.9=2.1.E[X] = 1(0.2) + 2(0.5) + 3(0.3) = 0.2 + 1.0 + 0.9 = 2.1.

Step 2: Compute E[X2]E[X^2]:

E[X2]=12(0.2)+22(0.5)+32(0.3)=0.2+2.0+2.7=4.9.E[X^2] = 1^2(0.2) + 2^2(0.5) + 3^2(0.3) = 0.2 + 2.0 + 2.7 = 4.9.

Step 3: Apply the computational formula:

Var(X)=4.9(2.1)2=4.94.41=0.49.\text{Var}(X) = 4.9 - (2.1)^2 = 4.9 - 4.41 = 0.49.

Step 4: Standard deviation: σ=0.49=0.7\sigma = \sqrt{0.49} = 0.7.

Verification via definition: E[(Xμ)2]=(12.1)2(0.2)+(22.1)2(0.5)+(32.1)2(0.3)=1.21(0.2)+0.01(0.5)+0.81(0.3)=0.242+0.005+0.243=0.49.E[(X-\mu)^2] = (1-2.1)^2(0.2) + (2-2.1)^2(0.5) + (3-2.1)^2(0.3) = 1.21(0.2) + 0.01(0.5) + 0.81(0.3) = 0.242 + 0.005 + 0.243 = 0.49.


Worked Example: Continuous Random Variable

Example: Variance of Uniform(a,b)

Let XUniform(a,b)X \sim \text{Uniform}(a, b) with f(x)=1baf(x) = \frac{1}{b-a} for x[a,b]x \in [a,b].

Step 1: E[X]=a+b2E[X] = \frac{a+b}{2}.

Step 2: E[X2]=abx2badx=1bax33ab=b3a33(ba)=a2+ab+b23.E[X^2] = \int_a^b \frac{x^2}{b-a}\,dx = \frac{1}{b-a}\cdot\frac{x^3}{3}\Big|_a^b = \frac{b^3 - a^3}{3(b-a)} = \frac{a^2+ab+b^2}{3}.

Step 3:

Var(X)=a2+ab+b23(a+b2)2=4(a2+ab+b2)3(a+b)212=(ba)212.\text{Var}(X) = \frac{a^2+ab+b^2}{3} - \left(\frac{a+b}{2}\right)^2 = \frac{4(a^2+ab+b^2) - 3(a+b)^2}{12} = \frac{(b-a)^2}{12}.

This elegant result shows variance depends only on the width bab-a, not the location — consistent with the translation invariance property.


Worked Example: Real Data — Exam Scores

Example: Variance of Exam Scores

A class of 10 students scored: {72,85,91,68,78,94,82,76,88,80}\{72, 85, 91, 68, 78, 94, 82, 76, 88, 80\}.

Step 1: Compute the mean:

xˉ=72+85+91+68+78+94+82+76+88+8010=81410=81.4\bar{x} = \frac{72 + 85 + 91 + 68 + 78 + 94 + 82 + 76 + 88 + 80}{10} = \frac{814}{10} = 81.4

Step 2: Compute squared deviations:

Score xix_ixixˉx_i - \bar{x}(xixˉ)2(x_i - \bar{x})^2
72-9.488.36
853.612.96
919.692.16
68-13.4179.56
78-3.411.56
9412.6158.76
820.60.36
76-5.429.16
886.643.56
80-1.41.96

Step 3: Compute variance (sample, using n1n-1):

s2=(xixˉ)2n1=618.409=68.71s^2 = \frac{\sum(x_i - \bar{x})^2}{n-1} = \frac{618.40}{9} = 68.71

Step 4: Standard deviation: s=68.71=8.29s = \sqrt{68.71} = 8.29

Interpretation: Scores typically deviate about 8.3 points from the class average of 81.4.


Python Implementation

import numpy as np
from scipy import stats

np.random.seed(42)

# Demonstrate variance properties with a Bernoulli(p) random variable
p = 0.6
n = 10000
samples = np.random.binomial(1, p, size=n)

# Empirical variance vs theoretical
empirical_var = np.var(samples, ddof=0)
theoretical_var = p * (1 - p)
print(f"Bernoulli(p={p}): empirical Var = {empirical_var:.4f}, theoretical Var = {theoretical_var:.4f}")

# Show Var(aX + b) = a^2 Var(X)
a, b_const = 3, 5
transformed = a * samples + b_const
print(f"Var({a}X + {b_const}) = {np.var(transformed, ddof=0):.4f}")
print(f"{a}^2 * Var(X)      = {a**2 * empirical_var:.4f}")

# Sum of independent RVs: Var(X+Y) = Var(X) + Var(Y)
samples_y = np.random.binomial(1, 0.3, size=n)
sum_var = np.var(samples + samples_y, ddof=0)
print(f"Var(X+Y) = {sum_var:.4f}")
print(f"Var(X) + Var(Y) = {np.var(samples, ddof=0) + np.var(samples_y, ddof=0):.4f}")

Python Implementation: Chebyshev Verification

import numpy as np

np.random.seed(42)

# Use an exponential distribution (skewed, not normal) to test Chebyshev
lam = 1.0
n = 100000
samples = np.random.exponential(1/lam, size=n)

mu = np.mean(samples)
sigma = np.std(samples)

# Empirical P(|X - mu| >= k*sigma) vs Chebyshev bound 1/k^2
for k in [1.5, 2, 3, 4]:
    empirical = np.mean(np.abs(samples - mu) >= k * sigma)
    bound = 1 / k**2
    print(f"k={k}: empirical P = {empirical:.4f}, Chebyshev bound = {bound:.4f}")

Python Implementation: Real Data Example

import numpy as np

# Exam scores from worked example
scores = np.array([72, 85, 91, 68, 78, 94, 82, 76, 88, 80])

# Population variance (divide by n)
pop_var = np.var(scores)
# Sample variance (divide by n-1)
sample_var = np.var(scores, ddof=1)

print(f"Mean: {np.mean(scores):.1f}")
print(f"Population variance: {pop_var:.2f}")
print(f"Sample variance:     {sample_var:.2f}")
print(f"Standard deviation:  {np.std(scores, ddof=1):.2f}")

# Manual computation for verification
mean = np.mean(scores)
manual_var = np.sum((scores - mean)**2) / (len(scores) - 1)
print(f"\nManual computation: {manual_var:.2f}")

Variance of Common Distributions

Reference Table

DistributionPMF/PDFE[X]E[X]Var(X)\text{Var}(X)
Bernoulli(p)(p)px(1p)1xp^x(1-p)^{1-x}ppp(1p)p(1-p)
Binomial(n,p)(n,p)(nk)pk(1p)nk\binom{n}{k}p^k(1-p)^{n-k}npnpnp(1p)np(1-p)
Geometric(p)(p)(1p)k1p(1-p)^{k-1}p1/p1/p(1p)/p2(1-p)/p^2
Poisson(λ)(\lambda)λkeλk!\frac{\lambda^k e^{-\lambda}}{k!}λ\lambdaλ\lambda
Uniform(a,b)(a,b)1ba\frac{1}{b-a}a+b2\frac{a+b}{2}(ba)212\frac{(b-a)^2}{12}
Normal(μ,σ2)(\mu,\sigma^2)1σ2πe(xμ)2/2σ2\frac{1}{\sigma\sqrt{2\pi}}e^{-(x-\mu)^2/2\sigma^2}μ\muσ2\sigma^2

Variance in Machine Learning

ML ApplicationVariance UsageWhy It Matters
Bias-variance tradeoffVariance of model predictionsHigh variance = overfitting
Feature selectionVariance thresholdRemove low-variance features
Ensemble methodsReduce variance via averagingBagging, random forests
RegularizationPenalize high-variance coefficientsRidge, Lasso regression

Key Takeaways

Variance measures spread: Var(X)=E[(Xμ)2]=E[X2](E[X])2\text{Var}(X) = E[(X-\mu)^2] = E[X^2] - (E[X])^2

Translation invariant: Var(X+b)=Var(X)\text{Var}(X+b) = \text{Var}(X); scale equivariant: Var(aX)=a2Var(X)\text{Var}(aX) = a^2\text{Var}(X)

Independence additivity: Var(X+Y)=Var(X)+Var(Y)\text{Var}(X+Y) = \text{Var}(X) + \text{Var}(Y) when XYX \perp Y

Var(X)=0    X\text{Var}(X) = 0 \iff X is a.s. constant

Chebyshev's inequality bounds tail probabilities using only μ\mu and σ2\sigma^2

Standard deviation σ\sigma returns to original units; variance σ2\sigma^2 is in squared units

"Variance is the price we pay for uncertainty." — Harry Markowitz

Premium Content

Variance of a Random Variable — Formula and Properties

Unlock this lesson and 900+ advanced tutorials with a Premium plan.

🎯End-to-end Projects
💼Interview Prep
📜Certificates
🤝Community Access

Already a member? Log in

Need Expert Statistics Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement