🎉 75% of content is free forever — Unlock Premium from $10/mo →
CW
Search courses…
💼 Servicesℹ️ About✉️ ContactView Pricing Plansfrom $10

Properties of Estimators — Unbiasedness, Efficiency, Consistency

Foundations of StatisticsStatistical Inference🟢 Free Lesson

Advertisement

Properties of Estimators — Unbiasedness, Efficiency, Consistency

Foundations of Statistics

What Makes an Estimator Good?

Estimator properties determine whether statistical procedures produce reliable, accurate, and trustworthy results. Unbiasedness, efficiency, and consistency are the holy trinity of estimator quality.

  • Method Selection — Choosing between competing estimators based on theoretical properties
  • Study Design — Ensuring data collection methods support estimation quality
  • Policy Analysis — Justifying estimator choices for consequential decisions

Understanding estimator properties separates rigorous statistics from mere calculation.


What Are Properties of Estimators?

DfProperties of Estimators

Good estimators share key properties: unbiasedness (the expected value equals the true parameter), efficiency (minimum variance among unbiased estimators), and consistency (converges to the true value as nn grows). These properties provide a framework for comparing competing estimators.


1. Unbiasedness

Unbiasedness

E[θ^]=θfor all θΘE[\hat{\theta}] = \theta \quad \text{for all } \theta \in \Theta

Here,

  • E[θ^]E[\hat{\theta}]=Expected value of the estimator
  • θ\theta=True population parameter

Bias Decomposition

For any estimator θ^\hat{\theta}, the mean squared error decomposes as:

MSE(θ^)=Var(θ^)+[Bias(θ^)]2\text{MSE}(\hat{\theta}) = \text{Var}(\hat{\theta}) + [\text{Bias}(\hat{\theta})]^2

where Bias(θ^)=E[θ^]θ\text{Bias}(\hat{\theta}) = E[\hat{\theta}] - \theta. An unbiased estimator has Bias=0\text{Bias} = 0, so MSE=Var\text{MSE} = \text{Var}.

Worked Example: Bias of σ^MLE2\hat{\sigma}^2_{\text{MLE}}

For XisimN(mu,σ2)X_i \\sim \mathcal{N}(\\mu, \sigma^2), the MLE is σ^MLE2=1n(XiXˉ)2\hat{\sigma}^2_{\text{MLE}} = \frac{1}{n}\sum(X_i - \bar{X})^2.

Compute the expectation:

E[σ^MLE2]=1nE[(XiXˉ)2]=1n(n1)σ2=n1nσ2E[\hat{\sigma}^2_{\text{MLE}}] = \frac{1}{n}\sum E[(X_i - \bar{X})^2] = \frac{1}{n}(n-1)\sigma^2 = \frac{n-1}{n}\sigma^2

Bias: Bias=n1nσ2σ2=σ2n\text{Bias} = \frac{n-1}{n}\sigma^2 - \sigma^2 = -\frac{\sigma^2}{n}.

Unbiased version: S2=1n1(XiXˉ)2S^2 = \frac{1}{n-1}\sum(X_i - \bar{X})^2 has E[S2]=σ2E[S^2] = \sigma^2.

Unbiasedness is Not Everything

The MLE σ^MLE2\hat{\sigma}^2_{\text{MLE}} is biased but has lower MSE than the unbiased S2S^2 for finite nn:

MSE(σ^MLE2)=2σ4n<2σ4n1=MSE(S2)\text{MSE}(\hat{\sigma}^2_{\text{MLE}}) = \frac{2\sigma^4}{n} < \frac{2\sigma^4}{n-1} = \text{MSE}(S^2)

Bias alone does not determine quality — the variance reduction from dividing by nn instead of n1n-1 outweighs the small bias.


2. Efficiency and the Cramér-Rao Lower Bound

ThCramér-Rao Lower Bound (CRLB)

Let θ^\hat{\theta} be any unbiased estimator of θ\theta. Under regularity conditions:

Var(θ^)1nI1(θ)\text{Var}(\hat{\theta}) \geq \frac{1}{n \cdot I_1(\theta)}

where I1(θ)=E[2theta2logf(X;θ)]I_1(\theta) = -E\left[\frac{\partial^2}{\partial\\theta^2}\log f(X;\theta)\right] is the Fisher information per observation.

Proof of the Cramér-Rao bound:

Let g(θ)=E[θ^]g(\theta) = E[\hat{\theta}]. For an unbiased estimator, g(θ)=θg(\theta) = \theta so g(θ)=1g'(\theta) = 1. Define the score function S(θ)=ell/theta=logf(Xi;θ)/thetaS(\theta) = \partial\\ell/\partial\\theta = \sum \partial\\log f(X_i;\theta)/\partial\\theta.

By the Cauchy-Schwarz inequality:

[E[(θ^θ)S(θ)]]2Var(θ^)Var(S(θ))[E[(\hat{\theta} - \theta) \cdot S(\theta)]]^2 \leq \text{Var}(\hat{\theta}) \cdot \text{Var}(S(\theta))

The left side equals [g(θ)]2=1[g'(\theta)]^2 = 1 (by the identity E[(θ^θ)S]=g(θ)E[(\hat{\theta}-\theta)S] = g'(\theta)). The right side has Var(S)=nI1(θ)\text{Var}(S) = nI_1(\theta). Therefore Var(θ^)1/(nI1(θ))\text{Var}(\hat{\theta}) \geq 1/(nI_1(\theta)).

Fisher Information

I1(θ)=E[2θ2logf(X;θ)]I_1(\theta) = -E\left[\frac{\partial^2}{\partial\theta^2}\log f(X;\theta)\right]

Here,

  • I1(θ)I_1(\theta)=Fisher information per observation
  • f(x;θ)f(x;\theta)=Probability density (or mass) function

Efficiency

DfEfficiency

An estimator θ^\hat{\theta} is efficient if it achieves the Cramér-Rao lower bound: Var(θ^)=1/(nI1(θ))\text{Var}(\hat{\theta}) = 1/(nI_1(\theta)). An efficient estimator has the smallest possible variance among all unbiased estimators.

Relative Efficiency

When comparing two unbiased estimators θ^1\hat{\theta}_1 and θ^2\hat{\theta}_2:

eff(θ^1,θ^2)=Var(θ^2)Var(θ^1)\text{eff}(\hat{\theta}_1, \hat{\theta}_2) = \frac{\text{Var}(\hat{\theta}_2)}{\text{Var}(\hat{\theta}_1)}

If eff>1\text{eff} > 1, then θ^1\hat{\theta}_1 is more efficient. The Rao-Blackwell theorem guarantees that conditioning a crude unbiased estimator on a sufficient statistic always improves (or maintains) efficiency.


3. Consistency

ThConsistency

An estimator θ^n\hat{\theta}_n is consistent if θ^npθ\hat{\theta}_n \xrightarrow{p} \theta as nton \\to \infty, i.e., for every ϵ>0\epsilon > 0:

limnP(θ^nθ>ϵ)=0\lim_{n \to \infty} P(|\hat{\theta}_n - \theta| > \epsilon) = 0

Consistency via MSE

A sufficient condition for consistency is MSE(θ^n)to0\text{MSE}(\hat{\theta}_n) \\to 0 as nton \\to \infty. Since MSE=Var+Bias2\text{MSE} = \text{Var} + \text{Bias}^2, this holds if both Var(θ^n)to0\text{Var}(\hat{\theta}_n) \\to 0 and Bias(θ^n)to0\text{Bias}(\hat{\theta}_n) \\to 0.

Consistency of the Sample Mean (Weak Law of Large Numbers)

ThWeak Law of Large Numbers (WLLN)

If X1,X2,ldotsX_1, X_2, \\ldots are i.i.d. with E[X]<E[|X|] < \infty, then Xˉnpmu=E[X]\bar{X}_n \xrightarrow{p} \\mu = E[X].

Proof (Chebyshev's inequality): Var(Xˉn)=σ2/nto0\text{Var}(\bar{X}_n) = \sigma^2/n \\to 0. By Chebyshev: P(Xˉnmu>ϵ)σ2/(nϵ2)to0P(|\bar{X}_n - \\mu| > \epsilon) \leq \sigma^2/(n\epsilon^2) \\to 0.


4. Sufficiency and the Rao-Blackwell Theorem

DfSufficient Statistic

A statistic T(X1,ldots,Xn)T(X_1, \\ldots, X_n) is sufficient for θ\theta if the conditional distribution of the data given TT does not depend on θ\theta. Equivalently, by the Fisher-Neyman factorization theorem: f(x1,ldots,xn;θ)=g(T(x1,ldots,xn);θ)h(x1,ldots,xn)f(x_1, \\ldots, x_n; \theta) = g(T(x_1, \\ldots, x_n); \theta) \cdot h(x_1, \\ldots, x_n).

ThRao-Blackwell Theorem

Let θ^\hat{\theta} be any unbiased estimator of θ\theta and TT a sufficient statistic. Then θ^RB=E[θ^midT]\hat{\theta}_{\text{RB}} = E[\hat{\theta} \\mid T] is unbiased and:

Var(θ^RB)Var(θ^)\text{Var}(\hat{\theta}_{\text{RB}}) \leq \text{Var}(\hat{\theta})

with equality only if θ^\hat{\theta} is already a function of TT.

Proof sketch: By the law of total variance: Var(θ^)=E[Var(θ^midT)]+Var(E[θ^midT])Var(θ^RB)\text{Var}(\hat{\theta}) = E[\text{Var}(\hat{\theta}\\mid T)] + \text{Var}(E[\hat{\theta}\\mid T]) \geq \text{Var}(\hat{\theta}_{\text{RB}}). The non-negative term E[Var(θ^midT)]E[\text{Var}(\hat{\theta}\\mid T)] vanishes if and only if θ^\hat{\theta} is a function of TT alone.


5. Lehmann-Scheffé Theorem

ThLehmann-Scheffé Theorem

If TT is a complete sufficient statistic and θ^\hat{\theta} is an unbiased estimator that is a function of TT, then θ^\hat{\theta} is the unique MVUE (minimum variance unbiased estimator).

Example: For XisimExp(lambda)X_i \\sim \text{Exp}(\\lambda), T=XiT = \sum X_i is complete sufficient. Xˉ=T/n\bar{X} = T/n is unbiased for 1/lambda1/\\lambda, so Xˉ\bar{X} is the MVUE. (Note: 1/Xˉ1/\bar{X} is the MLE but is not unbiased.)


Python Simulation: Comparing Estimators

import numpy as np
from scipy import stats

np.random.seed(42)
n_values = [5, 10, 25, 50, 100, 500]
reps = 10000
true_mu = 5.0
true_sigma = 3.0

print("Comparing estimators for σ² (true = 9.0):")
print(f"{'n':>6} {'MLE (÷n)':>12} {'Unbiased (÷(n-1))':>18} {'MLE Bias':>10} {'Unbiased Bias':>15}")

for n in n_values:
    mle_vars = []
    unbiased_vars = []
    for _ in range(reps):
        data = np.random.normal(true_mu, true_sigma, n)
        mu_hat = np.mean(data)
        mle_vars.append(np.mean((data - mu_hat)**2))
        unbiased_vars.append(np.var(data, ddof=1))

    mle_mean = np.mean(mle_vars)
    unbiased_mean = np.mean(unbiased_vars)
    mle_bias = mle_mean - true_sigma**2
    unbiased_bias = unbiased_mean - true_sigma**2
    print(f"{n:6d} {mle_mean:12.4f} {unbiased_mean:18.4f} {mle_bias:10.4f} {unbiased_bias:15.4f}")

# Demonstrate WLLN
print("\nWLLN demonstration (convergence of X̄ to μ):")
for n in [10, 100, 1000, 10000]:
    data = np.random.normal(true_mu, true_sigma, n)
    xbar = np.mean(data)
    print(f"  n={n:5d}: X̄ = {xbar:.4f} (error = {abs(xbar - true_mu):.4f})")

Key Takeaways

Summary: Properties of Estimators

  • Unbiasedness: E[θ^]=θE[\hat{\theta}] = \theta. Necessary but not sufficient — also need low variance
  • Efficiency: achieve the CRLB. The MLE is asymptotically efficient
  • Consistency: θ^npθ\hat{\theta}_n \xrightarrow{p} \theta. Guaranteed when MSEto0\text{MSE} \\to 0
  • Sufficiency: a statistic that captures all information about θ\theta in the data
  • Rao-Blackwell: conditioning on a sufficient statistic always reduces variance
  • Lehmann-Scheffé: the unique function of a complete sufficient statistic that is unbiased is the MVUE
  • The bias-variance tradeoff means unbiasedness is not always optimal; MSE is often a better criterion

Premium Content

Properties of Estimators — Unbiasedness, Efficiency, Consistency

Unlock this lesson and 900+ advanced tutorials with a Premium plan.

🎯End-to-end Projects
💼Interview Prep
📜Certificates
🤝Community Access

Already a member? Log in

Need Expert Statistics Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement