Properties of Estimators — Unbiasedness, Efficiency, Consistency

Foundations of Statistics

What Makes an Estimator Good?

Estimator properties determine whether statistical procedures produce reliable, accurate, and trustworthy results. Unbiasedness, efficiency, and consistency are the holy trinity of estimator quality.

Method Selection — Choosing between competing estimators based on theoretical properties
Study Design — Ensuring data collection methods support estimation quality
Policy Analysis — Justifying estimator choices for consequential decisions

Understanding estimator properties separates rigorous statistics from mere calculation.

What Are Properties of Estimators?

DfProperties of Estimators

Good estimators share key properties: unbiasedness (the expected value equals the true parameter), efficiency (minimum variance among unbiased estimators), and consistency (converges to the true value as $n$ grows). These properties provide a framework for comparing competing estimators.

1. Unbiasedness

Unbiasedness

E[\hat{\theta}] = \theta \quad \text{for all } \theta \in \Theta

Here,

$E[\hat{\theta}]$ =Expected value of the estimator
$\theta$ =True population parameter

Bias Decomposition

For any estimator $\hat{\theta}$ , the mean squared error decomposes as:

\text{MSE}(\hat{\theta}) = \text{Var}(\hat{\theta}) + [\text{Bias}(\hat{\theta})]^2

where $\text{Bias}(\hat{\theta}) = E[\hat{\theta}] - \theta$ . An unbiased estimator has $\text{Bias} = 0$ , so $\text{MSE} = \text{Var}$ .

Worked Example: Bias of $\hat{\sigma}^2_{\text{MLE}}$

For $X_i \\sim \mathcal{N}(\\mu, \sigma^2)$ , the MLE is $\hat{\sigma}^2_{\text{MLE}} = \frac{1}{n}\sum(X_i - \bar{X})^2$ .

Compute the expectation:

E[\hat{\sigma}^2_{\text{MLE}}] = \frac{1}{n}\sum E[(X_i - \bar{X})^2] = \frac{1}{n}(n-1)\sigma^2 = \frac{n-1}{n}\sigma^2

Bias: $\text{Bias} = \frac{n-1}{n}\sigma^2 - \sigma^2 = -\frac{\sigma^2}{n}$ .

Unbiased version: $S^2 = \frac{1}{n-1}\sum(X_i - \bar{X})^2$ has $E[S^2] = \sigma^2$ .

Unbiasedness is Not Everything

The MLE $\hat{\sigma}^2_{\text{MLE}}$ is biased but has lower MSE than the unbiased $S^2$ for finite $n$ :

\text{MSE}(\hat{\sigma}^2_{\text{MLE}}) = \frac{2\sigma^4}{n} < \frac{2\sigma^4}{n-1} = \text{MSE}(S^2)

Bias alone does not determine quality — the variance reduction from dividing by $n$ instead of $n-1$ outweighs the small bias.

2. Efficiency and the Cramér-Rao Lower Bound

ThCramér-Rao Lower Bound (CRLB)

Let $\hat{\theta}$ be any unbiased estimator of $\theta$ . Under regularity conditions:

\text{Var}(\hat{\theta}) \geq \frac{1}{n \cdot I_1(\theta)}

where $I_1(\theta) = -E\left[\frac{\partial^2}{\partial\\theta^2}\log f(X;\theta)\right]$ is the Fisher information per observation.

Proof of the Cramér-Rao bound:

Let $g(\theta) = E[\hat{\theta}]$ . For an unbiased estimator, $g(\theta) = \theta$ so $g'(\theta) = 1$ . Define the score function $S(\theta) = \partial\\ell/\partial\\theta = \sum \partial\\log f(X_i;\theta)/\partial\\theta$ .

By the Cauchy-Schwarz inequality:

[E[(\hat{\theta} - \theta) \cdot S(\theta)]]^2 \leq \text{Var}(\hat{\theta}) \cdot \text{Var}(S(\theta))

The left side equals $[g'(\theta)]^2 = 1$ (by the identity $E[(\hat{\theta}-\theta)S] = g'(\theta)$ ). The right side has $\text{Var}(S) = nI_1(\theta)$ . Therefore $\text{Var}(\hat{\theta}) \geq 1/(nI_1(\theta))$ .

Fisher Information

I_1(\theta) = -E\left[\frac{\partial^2}{\partial\theta^2}\log f(X;\theta)\right]

Here,

$I_1(\theta)$ =Fisher information per observation
$f(x;\theta)$ =Probability density (or mass) function

Efficiency

DfEfficiency

An estimator $\hat{\theta}$ is efficient if it achieves the Cramér-Rao lower bound: $\text{Var}(\hat{\theta}) = 1/(nI_1(\theta))$ . An efficient estimator has the smallest possible variance among all unbiased estimators.

Relative Efficiency

When comparing two unbiased estimators $\hat{\theta}_1$ and $\hat{\theta}_2$ :

\text{eff}(\hat{\theta}_1, \hat{\theta}_2) = \frac{\text{Var}(\hat{\theta}_2)}{\text{Var}(\hat{\theta}_1)}

If $\text{eff} > 1$ , then $\hat{\theta}_1$ is more efficient. The Rao-Blackwell theorem guarantees that conditioning a crude unbiased estimator on a sufficient statistic always improves (or maintains) efficiency.

3. Consistency

ThConsistency

An estimator $\hat{\theta}_n$ is consistent if $\hat{\theta}_n \xrightarrow{p} \theta$ as $n \\to \infty$ , i.e., for every $\epsilon > 0$ :

\lim_{n \to \infty} P(|\hat{\theta}_n - \theta| > \epsilon) = 0

Consistency via MSE

A sufficient condition for consistency is $\text{MSE}(\hat{\theta}_n) \\to 0$ as $n \\to \infty$ . Since $\text{MSE} = \text{Var} + \text{Bias}^2$ , this holds if both $\text{Var}(\hat{\theta}_n) \\to 0$ and $\text{Bias}(\hat{\theta}_n) \\to 0$ .

Consistency of the Sample Mean (Weak Law of Large Numbers)

ThWeak Law of Large Numbers (WLLN)

If $X_1, X_2, \\ldots$ are i.i.d. with $E[|X|] < \infty$ , then $\bar{X}_n \xrightarrow{p} \\mu = E[X]$ .

Proof (Chebyshev's inequality): $\text{Var}(\bar{X}_n) = \sigma^2/n \\to 0$ . By Chebyshev: $P(|\bar{X}_n - \\mu| > \epsilon) \leq \sigma^2/(n\epsilon^2) \\to 0$ .

4. Sufficiency and the Rao-Blackwell Theorem

DfSufficient Statistic

A statistic $T(X_1, \\ldots, X_n)$ is sufficient for $\theta$ if the conditional distribution of the data given $T$ does not depend on $\theta$ . Equivalently, by the Fisher-Neyman factorization theorem: $f(x_1, \\ldots, x_n; \theta) = g(T(x_1, \\ldots, x_n); \theta) \cdot h(x_1, \\ldots, x_n)$ .

ThRao-Blackwell Theorem

Let $\hat{\theta}$ be any unbiased estimator of $\theta$ and $T$ a sufficient statistic. Then $\hat{\theta}_{\text{RB}} = E[\hat{\theta} \\mid T]$ is unbiased and:

\text{Var}(\hat{\theta}_{\text{RB}}) \leq \text{Var}(\hat{\theta})

with equality only if $\hat{\theta}$ is already a function of $T$ .

Proof sketch: By the law of total variance: $\text{Var}(\hat{\theta}) = E[\text{Var}(\hat{\theta}\\mid T)] + \text{Var}(E[\hat{\theta}\\mid T]) \geq \text{Var}(\hat{\theta}_{\text{RB}})$ . The non-negative term $E[\text{Var}(\hat{\theta}\\mid T)]$ vanishes if and only if $\hat{\theta}$ is a function of $T$ alone.

5. Lehmann-Scheffé Theorem

ThLehmann-Scheffé Theorem

If $T$ is a complete sufficient statistic and $\hat{\theta}$ is an unbiased estimator that is a function of $T$ , then $\hat{\theta}$ is the unique MVUE (minimum variance unbiased estimator).

Example: For $X_i \\sim \text{Exp}(\\lambda)$ , $T = \sum X_i$ is complete sufficient. $\bar{X} = T/n$ is unbiased for $1/\\lambda$ , so $\bar{X}$ is the MVUE. (Note: $1/\bar{X}$ is the MLE but is not unbiased.)

Python Simulation: Comparing Estimators

import numpy as np
from scipy import stats

np.random.seed(42)
n_values = [5, 10, 25, 50, 100, 500]
reps = 10000
true_mu = 5.0
true_sigma = 3.0

print("Comparing estimators for σ² (true = 9.0):")
print(f"{'n':>6} {'MLE (÷n)':>12} {'Unbiased (÷(n-1))':>18} {'MLE Bias':>10} {'Unbiased Bias':>15}")

for n in n_values:
    mle_vars = []
    unbiased_vars = []
    for _ in range(reps):
        data = np.random.normal(true_mu, true_sigma, n)
        mu_hat = np.mean(data)
        mle_vars.append(np.mean((data - mu_hat)**2))
        unbiased_vars.append(np.var(data, ddof=1))

    mle_mean = np.mean(mle_vars)
    unbiased_mean = np.mean(unbiased_vars)
    mle_bias = mle_mean - true_sigma**2
    unbiased_bias = unbiased_mean - true_sigma**2
    print(f"{n:6d} {mle_mean:12.4f} {unbiased_mean:18.4f} {mle_bias:10.4f} {unbiased_bias:15.4f}")

# Demonstrate WLLN
print("\nWLLN demonstration (convergence of X̄ to μ):")
for n in [10, 100, 1000, 10000]:
    data = np.random.normal(true_mu, true_sigma, n)
    xbar = np.mean(data)
    print(f"  n={n:5d}: X̄ = {xbar:.4f} (error = {abs(xbar - true_mu):.4f})")

Key Takeaways

Summary: Properties of Estimators

Unbiasedness: $E[\hat{\theta}] = \theta$ . Necessary but not sufficient — also need low variance
Efficiency: achieve the CRLB. The MLE is asymptotically efficient
Consistency: $\hat{\theta}_n \xrightarrow{p} \theta$ . Guaranteed when $\text{MSE} \\to 0$
Sufficiency: a statistic that captures all information about $\theta$ in the data
Rao-Blackwell: conditioning on a sufficient statistic always reduces variance
Lehmann-Scheffé: the unique function of a complete sufficient statistic that is unbiased is the MVUE
The bias-variance tradeoff means unbiasedness is not always optimal; MSE is often a better criterion

Properties of Estimators — Unbiasedness, Efficiency, Consistency

Properties of Estimators — Unbiasedness, Efficiency, Consistency

What Makes an Estimator Good?

What Are Properties of Estimators?

DfProperties of Estimators

1. Unbiasedness

Unbiasedness

Worked Example: Bias of $\hat{\sigma}^2_{\text{MLE}}$

2. Efficiency and the Cramér-Rao Lower Bound

ThCramér-Rao Lower Bound (CRLB)

Fisher Information

Efficiency

DfEfficiency

3. Consistency

ThConsistency

Consistency of the Sample Mean (Weak Law of Large Numbers)

ThWeak Law of Large Numbers (WLLN)

4. Sufficiency and the Rao-Blackwell Theorem

DfSufficient Statistic

ThRao-Blackwell Theorem

5. Lehmann-Scheffé Theorem

ThLehmann-Scheffé Theorem

Python Simulation: Comparing Estimators

Key Takeaways

Summary: Properties of Estimators

Premium Content

Need Expert Statistics Help?

Properties of Estimators — Unbiasedness, Efficiency, Consistency

Properties of Estimators — Unbiasedness, Efficiency, Consistency

What Makes an Estimator Good?

What Are Properties of Estimators?

DfProperties of Estimators

1. Unbiasedness

Unbiasedness

Worked Example: Bias of σ^MLE2\hat{\sigma}^2_{\text{MLE}}σ^MLE2​

2. Efficiency and the Cramér-Rao Lower Bound

ThCramér-Rao Lower Bound (CRLB)

Fisher Information

Efficiency

DfEfficiency

3. Consistency

ThConsistency

Consistency of the Sample Mean (Weak Law of Large Numbers)

ThWeak Law of Large Numbers (WLLN)

4. Sufficiency and the Rao-Blackwell Theorem

DfSufficient Statistic

ThRao-Blackwell Theorem

5. Lehmann-Scheffé Theorem

ThLehmann-Scheffé Theorem

Python Simulation: Comparing Estimators

Key Takeaways

Summary: Properties of Estimators

Premium Content

Need Expert Statistics Help?

Worked Example: Bias of $\hat{\sigma}^2_{\text{MLE}}$