Point Estimation — Estimating Population Parameters

Foundations of Statistics

The Art of Single-Number Guesswork

Point estimation provides the best single guess for unknown population parameters, forming the basis for all statistical inference. Understanding estimator properties ensures your estimates are trustworthy and meaningful.

Survey Research — Producing point estimates of population characteristics from samples
Finance — Estimating expected returns and volatility from historical data
Manufacturing — Calculating process parameters for quality control

Good estimation is the foundation of good statistical practice.

What Is Point Estimation?

DfPoint Estimation

A point estimator is a function of the data $\hat{\theta} = T(X_1, \\ldots, X_n)$ that produces a single value as a guess for an unknown population parameter $\theta$ . The goal is to find estimators with desirable properties: they should be close to the true value on average, have low variability, and converge to the truth as data accumulates.

The Method of Moments

ThMethod of Moments (MoM)

Set the first $k$ sample moments equal to the first $k$ population moments and solve for the $k$ unknown parameters. That is, solve:

\frac{1}{n}\sum_{i=1}^n X_i^j = E[X^j] \quad \text{for } j = 1, 2, \ldots, k

for $\theta_1, \\ldots, \theta_k$ .

Worked example — Exponential distribution: Let $X_1, \\ldots, X_n \\sim \text{Exp}(\\lambda)$ . We have $E[X] = 1/\\lambda$ and $E[X^2] = 2/\\lambda^2$ . Setting $\bar{X} = 1/\\lambda$ gives $\hat{\\lambda}_{\text{MoM}} = 1/\bar{X}$ .

Worked example — Normal distribution: For $X_i \\sim \mathcal{N}(\\mu, \sigma^2)$ , match $E[X] = \\mu$ and $E[X^2] = \\mu^2 + \sigma^2$ :

\hat{\mu}_{\text{MoM}} = \bar{X}, \quad \hat{\sigma}^2_{\text{MoM}} = \frac{1}{n}\sum X_i^2 - \bar{X}^2 = \frac{1}{n}\sum(X_i - \bar{X})^2

MoM vs MLE

MoM is simpler (just solving equations) but generally less efficient than MLE. MoM does not require specifying the full likelihood — only the first $k$ moments. It is useful as a starting value for MLE algorithms.

Maximum Likelihood Estimation

Maximum Likelihood Estimator

\hat{\theta}_{\text{MLE}} = \arg\max_{\theta} \; L(\theta \mid x_1, \ldots, x_n) = \arg\max_{\theta} \; \prod_{i=1}^n f(x_i; \theta)

Here,

$L(\theta \mid x)$ =Likelihood function
$f(x_i; \theta)$ =Probability density (or mass) function

Equivalently, maximize the log-likelihood:

Log-Likelihood

\ell(\theta) = \sum_{i=1}^n \log f(x_i; \theta)

Here,

$\ell(\theta)$ =Log-likelihood function

ThMLE for the Normal Distribution

For $X_i \\sim \mathcal{N}(\\mu, \sigma^2)$ , the log-likelihood is:

\ell(\mu, \sigma^2) = -\frac{n}{2}\log(2\pi) - \frac{n}{2}\log\sigma^2 - \frac{1}{2\sigma^2}\sum_{i=1}^n (x_i - \mu)^2

Setting $\partial\\ell/\partial\\mu = 0$ : $\hat{\\mu}_{\text{MLE}} = \bar{X}$ .

Setting $\partial\\ell/\partial\\sigma^2 = 0$ : $\hat{\sigma}^2_{\text{MLE}} = \frac{1}{n}\sum(x_i - \bar{X})^2$ .

Note: $\hat{\sigma}^2_{\text{MLE}}$ is biased — it divides by $n$ , not $n-1$ .

Worked Example: MLE for the Poisson Distribution

Let $X_1, \\ldots, X_n \\sim \text{Poisson}(\\lambda)$ . The PMF is $P(X = k) = \frac{\\lambda^k e^{-\\lambda}}{k!}$ .

Step 1: Write the log-likelihood:

\ell(\lambda) = \sum_{i=1}^n (x_i \log\lambda - \lambda - \log x_i!) = (\sum x_i)\log\lambda - n\lambda + \text{const}

Step 2: Differentiate and set to zero:

\frac{d\ell}{d\lambda} = \frac{\sum x_i}{\lambda} - n = 0 \implies \hat{\lambda}_{\text{MLE}} = \bar{X}

Step 3: Verify it's a maximum: $d^2\ell/d\\lambda^2 = -\sum x_i / \\lambda^2 < 0$ .

The MLE for $\\lambda$ is the sample mean — the same as the MoM estimator for the Poisson.

Asymptotic Properties of MLEs

ThConsistency of MLE

Under regularity conditions, the MLE is consistent: $\hat{\theta}_{\text{MLE}} \xrightarrow{p} \theta$ as $n \\to \infty$ .

ThAsymptotic Normality of MLE

Under regularity conditions, the MLE is asymptotically normal:

\sqrt{n}(\hat{\theta}_{\text{MLE}} - \theta) \xrightarrow{d} \mathcal{N}\left(0, \frac{1}{I_1(\theta)}\right)

where $I_1(\theta) = -E\left[\frac{\partial^2}{\partial\\theta^2}\log f(X;\theta)\right]$ is the Fisher information per observation.

ThAsymptotic Efficiency of MLE

The MLE achieves the Cramér-Rao lower bound asymptotically: among all regular estimators, the MLE has the smallest possible asymptotic variance $1/(nI_1(\theta))$ .

Proof sketch (sketch of Cramér-Rao): For any unbiased estimator $\hat{\theta}$ , the Cauchy-Schwarz inequality applied to $\text{Cov}(\hat{\theta}, \partial\\ell/\partial\\theta)$ gives $\text{Var}(\hat{\theta}) \geq 1/(nI_1(\theta))$ . The MLE achieves equality asymptotically because the score equation $\partial\\ell/\partial\\theta = 0$ is asymptotically equivalent to a linear function of the data.

Fisher Information

I(\theta) = n \cdot I_1(\theta) = -n \cdot E\left[\frac{\partial^2}{\partial\theta^2}\log f(X;\theta)\right]

Here,

$I(\theta)$ =Total Fisher information for sample of size $n$
$I_1(\theta)$ =Fisher information per observation

Example: For $X \\sim \mathcal{N}(\\mu, \sigma^2)$ with $\sigma^2$ known:

\log f(x;\mu) = -\frac{(x-\mu)^2}{2\sigma^2} + \text{const} \implies \frac{\partial^2}{\partial\mu^2}\log f = -\frac{1}{\sigma^2}

So $I_1(\\mu) = 1/\sigma^2$ and the Cramér-Rao bound gives $\text{Var}(\hat{\\mu}) \geq \sigma^2/n$ . Since $\text{Var}(\bar{X}) = \sigma^2/n$ , the sample mean achieves the bound — it is the MVUE for $\\mu$ .

Python Implementation: Comparing MoM and MLE

import numpy as np
from scipy import stats

np.random.seed(42)
n = 50

# --- Exponential distribution ---
true_lambda = 2.5
data = np.random.exponential(1/true_lambda, size=n)

# MoM estimator
mom_lambda = 1 / np.mean(data)

# MLE (same form for exponential)
mle_lambda = 1 / np.mean(data)  # MoM = MLE for exponential

print(f"Exponential: true λ = {true_lambda}")
print(f"  MoM = MLE = {mle_lambda:.4f}")

# --- Normal distribution ---
true_mu, true_sigma = 5.0, 3.0
data_normal = np.random.normal(true_mu, true_sigma, size=n)

# MoM
mom_mu = np.mean(data_normal)
mom_sigma2 = np.mean(data_normal**2) - mom_mu**2

# MLE
mle_mu = np.mean(data_normal)
mle_sigma2 = np.mean((data_normal - mle_mu)**2)  # biased (divides by n)

# Unbiased
unbiased_sigma2 = np.var(data_normal, ddof=1)  # divides by n-1

print(f"\nNormal: true μ = {true_mu}, σ² = {true_sigma**2}")
print(f"  MoM μ̂ = {mom_mu:.4f}, MLE μ̂ = {mle_mu:.4f}")
print(f"  MoM σ̂² = {mom_sigma2:.4f}, MLE σ̂² = {mle_sigma2:.4f}, Unbiased = {unbiased_sigma2:.4f}")

# --- Demonstrate MLE consistency ---
print(f"\nConsistency demo (MLE σ̂² vs true σ² = {true_sigma**2}):")
for n_small in [10, 50, 200, 1000, 5000]:
    samples = np.random.normal(true_mu, true_sigma, size=n_small)
    mle_var = np.mean((samples - np.mean(samples))**2)
    print(f"  n={n_small:5d}: MLE σ̂² = {mle_var:.4f} (error = {abs(mle_var - true_sigma**2):.4f})")

Key Takeaways

Summary: Point Estimation

Method of Moments: match sample moments to population moments; simple but generally less efficient
MLE: maximize the likelihood function; asymptotically efficient and consistent
MLE for $\sigma^2$ divides by $n$ (biased); use $n-1$ for an unbiased estimator
Fisher information $I(\theta)$ quantifies how much data tells us about $\theta$
The Cramér-Rao bound sets the minimum variance for unbiased estimators: $\text{Var}(\hat{\theta}) \geq 1/(nI_1(\theta))$
MLE achieves this bound asymptotically, making it the best large-sample estimator

Point Estimation — Estimating Population Parameters

Point Estimation — Estimating Population Parameters

The Art of Single-Number Guesswork

What Is Point Estimation?

DfPoint Estimation

The Method of Moments

ThMethod of Moments (MoM)

Maximum Likelihood Estimation

Maximum Likelihood Estimator

Log-Likelihood

ThMLE for the Normal Distribution

Worked Example: MLE for the Poisson Distribution

Asymptotic Properties of MLEs

ThConsistency of MLE

ThAsymptotic Normality of MLE

ThAsymptotic Efficiency of MLE

Fisher Information

Fisher Information

Python Implementation: Comparing MoM and MLE

Key Takeaways

Summary: Point Estimation

Premium Content

Need Expert Statistics Help?