🎉 75% of content is free forever — Unlock Premium from $10/mo →
CW
Search courses…
💼 Servicesℹ️ About✉️ ContactView Pricing Plansfrom $10

Variance — Population vs Sample Formula and Interpretation

Foundations of StatisticsDescriptive Statistics🟢 Free Lesson

Advertisement

Variance

Descriptive Statistics

Why Does Data Spread Out?

Variance is the foundation of statistical dispersion — it tells you how far data points wander from the average.

Understanding variance helps you:

  • Quantify uncertainty — measure how reliable or volatile a dataset truly is
  • Compare datasets — see which group has more consistent behavior
  • Build estimators — understand why dividing by n-1 produces unbiased results
  • Unlock advanced measures — standard deviation, skewness, and kurtosis all build on variance

Master variance and every other measure of spread becomes a natural extension.


What is Variance?

Definition

Variance measures the average squared deviation from the mean. It quantifies the spread or dispersion of a random variable around its expected value.

For a random variable XX with mean μ=E[X]\mu = E[X], the variance is:

Population Variance (Definition)

σ2=Var(X)=E[(Xμ)2]=i=1N(xiμ)2P(X=xi)\sigma^2 = \text{Var}(X) = E\left[(X - \mu)^2\right] = \sum_{i=1}^{N} (x_i - \mu)^2 \cdot P(X = x_i)

Here,

  • σ2\sigma^2=Population variance
  • μ\mu=Population mean (expected value)
  • NN=Population size
  • P(X=xi)P(X = x_i)=Probability of outcome xᵢ

For a finite population of NN equally likely observations:

Finite Population Variance

σ2=1Ni=1N(xiμ)2\sigma^2 = \frac{1}{N} \sum_{i=1}^{N} (x_i - \mu)^2

Here,

  • NN=Population size
  • xix_i=The i-th observation
  • μ\mu=Population mean

The Shortcut Formula

Using the identity E[(Xμ)2]=E[X2](E[X])2E[(X-\mu)^2] = E[X^2] - (E[X])^2, we obtain the computationally equivalent form:

ThComputational Formula for Variance

σ2=E[X2](E[X])2\sigma^2 = E[X^2] - \left(E[X]\right)^2

Proof sketch. Expand (Xμ)2=X22μX+μ2(X - \mu)^2 = X^2 - 2\mu X + \mu^2 and take expectations. Since E[X]=μE[X] = \mu, we get E[X2]2μ2+μ2=E[X2]μ2E[X^2] - 2\mu^2 + \mu^2 = E[X^2] - \mu^2. \square

This form is useful because it requires only one pass through the data (computing xi\sum x_i and xi2\sum x_i^2 simultaneously).


Sample Variance and Bessel's Correction

Given a sample x1,x2,,xnx_1, x_2, \ldots, x_n drawn i.i.d. from a population with variance σ2\sigma^2, the sample variance is:

Sample Variance (Bessel's Correction)

s2=1n1i=1n(xixˉ)2s^2 = \frac{1}{n-1} \sum_{i=1}^{n} (x_i - \bar{x})^2

Here,

  • s2s^2=Sample variance (unbiased estimator of σ²)
  • xˉ\bar{x}=Sample mean
  • nn=Sample size
  • n1n-1=Degrees of freedom

Why n1n - 1? — Degrees of Freedom

The Intuition Behind n−1

The sample mean xˉ\bar{x} is computed from the same data. It locks in one linear constraint: (xixˉ)=0\sum(x_i - \bar{x}) = 0. This means only n1n-1 of the nn deviations (xixˉ)(x_i - \bar{x}) are free to vary — the last is determined by the others. Hence we lose one degree of freedom.

More formally: dividing by nn systematically underestimates σ2\sigma^2 because the sample mean xˉ\bar{x} is closer to the sample points than the true μ\mu is. Bessel's correction by nn1\frac{n}{n-1} compensates for this bias.

ThUnbiasedness of $s^2$

E[s2]=σ2E[s^2] = \sigma^2

Proof. For i.i.d. observations with E[Xi]=μE[X_i] = \mu and Var(Xi)=σ2\text{Var}(X_i) = \sigma^2:

E[i=1n(XiXˉ)2]=i=1nE[(XiXˉ)2]=(n1)σ2E\left[\sum_{i=1}^n (X_i - \bar{X})^2\right] = \sum_{i=1}^n E\left[(X_i - \bar{X})^2\right] = (n-1)\sigma^2

Therefore E[s2]=n1n1σ2=σ2E[s^2] = \frac{n-1}{n-1}\sigma^2 = \sigma^2. \square

The computational form of the sample variance is:

Computational Formula for Sample Variance

s2=i=1nxi21n(i=1nxi)2n1s^2 = \frac{\sum_{i=1}^n x_i^2 - \frac{1}{n}\left(\sum_{i=1}^n x_i\right)^2}{n-1}

Here,

  • s2s^2=Sample variance
  • xix_i=The i-th observation
  • nn=Sample size

Algebraic Properties of Variance

ThProperties of Variance

For constants a,ba, b and independent random variables X,YX, Y:

  1. Non-negativity: Var(X)0\text{Var}(X) \geq 0, with equality iff XX is constant a.s.
  2. Translation invariance: Var(X+b)=Var(X)\text{Var}(X + b) = \text{Var}(X)
  3. Scaling: Var(aX)=a2Var(X)\text{Var}(aX) = a^2 \text{Var}(X)
  4. Additivity for independent variables: Var(X+Y)=Var(X)+Var(Y)\text{Var}(X + Y) = \text{Var}(X) + \text{Var}(Y)
  5. General linear combination: Var(aX+bY)=a2Var(X)+b2Var(Y)\text{Var}(aX + bY) = a^2\text{Var}(X) + b^2\text{Var}(Y) (if XYX \perp Y)

Dependence Matters

When XX and YY are not independent, the general formula is:

Var(X+Y)=Var(X)+Var(Y)+2Cov(X,Y)\text{Var}(X + Y) = \text{Var}(X) + \text{Var}(Y) + 2\text{Cov}(X, Y)

where Cov(X,Y)=E[(XμX)(YμY)]\text{Cov}(X,Y) = E[(X-\mu_X)(Y-\mu_Y)] is the covariance. Independence implies Cov(X,Y)=0\text{Cov}(X,Y) = 0, but the converse is not generally true.


Variance as a Second Central Moment

The variance is the second central moment of a distribution. The general framework of moments provides:

Central Moments

μk=E[(Xμ)k]\mu_k = E\left[(X - \mu)^k\right]

Here,

  • μk\mu_k=k-th central moment
  • μ\mu=First central moment = E[X]
  • μ2\mu_2=Variance = Var(X)

The skewness uses the third central moment (μ3\mu_3), and kurtosis uses the fourth (μ4\mu_4). Variance is the foundational building block.


Worked Example

Given the sample: x={4,7,13,2,1,8,11,6,9,5}x = \{4, 7, 13, 2, 1, 8, 11, 6, 9, 5\} with n=10n = 10:

Step 1: Compute xˉ\bar{x}:

xˉ=4+7+13+2+1+8+11+6+9+510=6610=6.6\bar{x} = \frac{4+7+13+2+1+8+11+6+9+5}{10} = \frac{66}{10} = 6.6

Step 2: Compute squared deviations:

xix_ixixˉx_i - \bar{x}(xixˉ)2(x_i - \bar{x})^2
4−2.66.76
70.40.16
136.440.96
2−4.621.16
1−5.631.36
81.41.96
114.419.36
6−0.60.36
92.45.76
5−1.62.56

Step 3: Sum: (xixˉ)2=130.4\sum(x_i - \bar{x})^2 = 130.4

Step 4: Population variance: σ2=130.4/10=13.04\sigma^2 = 130.4 / 10 = 13.04

Step 5: Sample variance: s2=130.4/9=14.48s^2 = 130.4 / 9 = 14.4\overline{8}


The Bias of the Naive Estimator

Simulation: Bias of ÷n vs ÷(n−1)

Let us verify the bias empirically. Draw repeated samples of size nn from a known population and compute both estimators:

EstimatorFormulaExpectation
σ^n2\hat{\sigma}^2_n (biased)1n(xixˉ)2\frac{1}{n}\sum(x_i - \bar{x})^2n1nσ2<σ2\frac{n-1}{n}\sigma^2 < \sigma^2
s2s^2 (unbiased)1n1(xixˉ)2\frac{1}{n-1}\sum(x_i - \bar{x})^2σ2\sigma^2

The ratio nn1\frac{n}{n-1} is the bias correction factor. For n=10n = 10, the biased estimator systematically underestimates by 910=90%\frac{9}{10} = 90\%.


Relationship to Standard Deviation

The standard deviation σ=σ2\sigma = \sqrt{\sigma^2} returns the spread to the original units of the data. While variance is mathematically convenient (additive for independent variables), standard deviation is more interpretable because it shares the units of the mean.


Variance in Machine Learning

ML ApplicationVariance UsageWhy
Bias-variance tradeoffModel variance = overfittingHigh variance = complex model
Feature selectionLow variance → removeNo signal in feature
RegularizationPenalize high variance weightsRidge/Lasso reduce variance
Ensemble methodsBagging reduces varianceAverage many high-variance models
Information gainVariance reduction = split qualityDecision trees split on variance
import numpy as np
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import BaggingRegressor
from sklearn.metrics import mean_squared_error

np.random.seed(42)
n = 200
X = np.random.randn(n, 1) * 10
y = 3 * X[:,0] + np.random.randn(n) * 3  # signal + noise

# Single tree: high variance
tree = DecisionTreeRegressor(max_depth=10, random_state=42)
from sklearn.model_selection import cross_val_score
tree_var = -cross_val_score(tree, X, y, cv=10, scoring='neg_mean_squared_error').var()
print(f"Single tree MSE variance across folds: {tree_var:.2f}")

# Bagging: reduces variance by averaging
bagging = BaggingRegressor(n_estimators=10, random_state=42)
bag_var = -cross_val_score(bagging, X, y, cv=10, scoring='neg_mean_squared_error').var()
print(f"Bagging MSE variance across folds: {bag_var:.2f}")
print(f"Variance reduction: {(1 - bag_var/tree_var)*100:.1f}%")

Key Takeaways

Variance = expected squared deviation from the mean — it quantifies spread

Bessel's correction (dividing by n−1) makes the sample variance unbiased because the sample mean absorbs one degree of freedom

Variance is additive for independent variables: Var(X+Y) = Var(X) + Var(Y)

Scaling: Var(aX) = a²Var(X) — variance scales quadratically with constants

"Variance is the price you pay for randomness." — Every model that ignores spread will be surprised by reality.

Premium Content

Variance — Population vs Sample Formula and Interpretation

Unlock this lesson and 900+ advanced tutorials with a Premium plan.

🎯End-to-end Projects
💼Interview Prep
📜Certificates
🤝Community Access

Already a member? Log in

Need Expert Statistics Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement