🎉 75% of content is free forever — Unlock Premium from $10/mo →
CW
Search courses…
💼 Servicesℹ️ About✉️ ContactView Pricing Plansfrom $10

Point Estimation — Estimating Population Parameters

Foundations of StatisticsStatistical Inference🟢 Free Lesson

Advertisement

Point Estimation — Estimating Population Parameters

Foundations of Statistics

The Art of Single-Number Guesswork

Point estimation provides the best single guess for unknown population parameters, forming the basis for all statistical inference. Understanding estimator properties ensures your estimates are trustworthy and meaningful.

  • Survey Research — Producing point estimates of population characteristics from samples
  • Finance — Estimating expected returns and volatility from historical data
  • Manufacturing — Calculating process parameters for quality control

Good estimation is the foundation of good statistical practice.


What Is Point Estimation?

DfPoint Estimation

A point estimator is a function of the data θ^=T(X1,ldots,Xn)\hat{\theta} = T(X_1, \\ldots, X_n) that produces a single value as a guess for an unknown population parameter θ\theta. The goal is to find estimators with desirable properties: they should be close to the true value on average, have low variability, and converge to the truth as data accumulates.


The Method of Moments

ThMethod of Moments (MoM)

Set the first kk sample moments equal to the first kk population moments and solve for the kk unknown parameters. That is, solve:

1ni=1nXij=E[Xj]for j=1,2,,k\frac{1}{n}\sum_{i=1}^n X_i^j = E[X^j] \quad \text{for } j = 1, 2, \ldots, k

for θ1,ldots,θk\theta_1, \\ldots, \theta_k.

Worked example — Exponential distribution: Let X1,ldots,XnsimExp(lambda)X_1, \\ldots, X_n \\sim \text{Exp}(\\lambda). We have E[X]=1/lambdaE[X] = 1/\\lambda and E[X2]=2/lambda2E[X^2] = 2/\\lambda^2. Setting Xˉ=1/lambda\bar{X} = 1/\\lambda gives lambda^MoM=1/Xˉ\hat{\\lambda}_{\text{MoM}} = 1/\bar{X}.

Worked example — Normal distribution: For XisimN(mu,σ2)X_i \\sim \mathcal{N}(\\mu, \sigma^2), match E[X]=muE[X] = \\mu and E[X2]=mu2+σ2E[X^2] = \\mu^2 + \sigma^2:

μ^MoM=Xˉ,σ^MoM2=1nXi2Xˉ2=1n(XiXˉ)2\hat{\mu}_{\text{MoM}} = \bar{X}, \quad \hat{\sigma}^2_{\text{MoM}} = \frac{1}{n}\sum X_i^2 - \bar{X}^2 = \frac{1}{n}\sum(X_i - \bar{X})^2

MoM vs MLE

MoM is simpler (just solving equations) but generally less efficient than MLE. MoM does not require specifying the full likelihood — only the first kk moments. It is useful as a starting value for MLE algorithms.


Maximum Likelihood Estimation

Maximum Likelihood Estimator

θ^MLE=argmaxθ  L(θx1,,xn)=argmaxθ  i=1nf(xi;θ)\hat{\theta}_{\text{MLE}} = \arg\max_{\theta} \; L(\theta \mid x_1, \ldots, x_n) = \arg\max_{\theta} \; \prod_{i=1}^n f(x_i; \theta)

Here,

  • L(θx)L(\theta \mid x)=Likelihood function
  • f(xi;θ)f(x_i; \theta)=Probability density (or mass) function

Equivalently, maximize the log-likelihood:

Log-Likelihood

(θ)=i=1nlogf(xi;θ)\ell(\theta) = \sum_{i=1}^n \log f(x_i; \theta)

Here,

  • (θ)\ell(\theta)=Log-likelihood function

ThMLE for the Normal Distribution

For XisimN(mu,σ2)X_i \\sim \mathcal{N}(\\mu, \sigma^2), the log-likelihood is:

(μ,σ2)=n2log(2π)n2logσ212σ2i=1n(xiμ)2\ell(\mu, \sigma^2) = -\frac{n}{2}\log(2\pi) - \frac{n}{2}\log\sigma^2 - \frac{1}{2\sigma^2}\sum_{i=1}^n (x_i - \mu)^2

Setting ell/mu=0\partial\\ell/\partial\\mu = 0: mu^MLE=Xˉ\hat{\\mu}_{\text{MLE}} = \bar{X}.

Setting ell/sigma2=0\partial\\ell/\partial\\sigma^2 = 0: σ^MLE2=1n(xiXˉ)2\hat{\sigma}^2_{\text{MLE}} = \frac{1}{n}\sum(x_i - \bar{X})^2.

Note: σ^MLE2\hat{\sigma}^2_{\text{MLE}} is biased — it divides by nn, not n1n-1.


Worked Example: MLE for the Poisson Distribution

Let X1,ldots,XnsimPoisson(lambda)X_1, \\ldots, X_n \\sim \text{Poisson}(\\lambda). The PMF is P(X=k)=lambdakelambdak!P(X = k) = \frac{\\lambda^k e^{-\\lambda}}{k!}.

Step 1: Write the log-likelihood:

(λ)=i=1n(xilogλλlogxi!)=(xi)logλnλ+const\ell(\lambda) = \sum_{i=1}^n (x_i \log\lambda - \lambda - \log x_i!) = (\sum x_i)\log\lambda - n\lambda + \text{const}

Step 2: Differentiate and set to zero:

ddλ=xiλn=0    λ^MLE=Xˉ\frac{d\ell}{d\lambda} = \frac{\sum x_i}{\lambda} - n = 0 \implies \hat{\lambda}_{\text{MLE}} = \bar{X}

Step 3: Verify it's a maximum: d2/dlambda2=xi/lambda2<0d^2\ell/d\\lambda^2 = -\sum x_i / \\lambda^2 < 0.

The MLE for lambda\\lambda is the sample mean — the same as the MoM estimator for the Poisson.


Asymptotic Properties of MLEs

ThConsistency of MLE

Under regularity conditions, the MLE is consistent: θ^MLEpθ\hat{\theta}_{\text{MLE}} \xrightarrow{p} \theta as nton \\to \infty.

ThAsymptotic Normality of MLE

Under regularity conditions, the MLE is asymptotically normal:

n(θ^MLEθ)dN(0,1I1(θ))\sqrt{n}(\hat{\theta}_{\text{MLE}} - \theta) \xrightarrow{d} \mathcal{N}\left(0, \frac{1}{I_1(\theta)}\right)

where I1(θ)=E[2theta2logf(X;θ)]I_1(\theta) = -E\left[\frac{\partial^2}{\partial\\theta^2}\log f(X;\theta)\right] is the Fisher information per observation.

ThAsymptotic Efficiency of MLE

The MLE achieves the Cramér-Rao lower bound asymptotically: among all regular estimators, the MLE has the smallest possible asymptotic variance 1/(nI1(θ))1/(nI_1(\theta)).

Proof sketch (sketch of Cramér-Rao): For any unbiased estimator θ^\hat{\theta}, the Cauchy-Schwarz inequality applied to Cov(θ^,ell/theta)\text{Cov}(\hat{\theta}, \partial\\ell/\partial\\theta) gives Var(θ^)1/(nI1(θ))\text{Var}(\hat{\theta}) \geq 1/(nI_1(\theta)). The MLE achieves equality asymptotically because the score equation ell/theta=0\partial\\ell/\partial\\theta = 0 is asymptotically equivalent to a linear function of the data.


Fisher Information

Fisher Information

I(θ)=nI1(θ)=nE[2θ2logf(X;θ)]I(\theta) = n \cdot I_1(\theta) = -n \cdot E\left[\frac{\partial^2}{\partial\theta^2}\log f(X;\theta)\right]

Here,

  • I(θ)I(\theta)=Total Fisher information for sample of size $n$
  • I1(θ)I_1(\theta)=Fisher information per observation

Example: For XsimN(mu,σ2)X \\sim \mathcal{N}(\\mu, \sigma^2) with σ2\sigma^2 known:

logf(x;μ)=(xμ)22σ2+const    2μ2logf=1σ2\log f(x;\mu) = -\frac{(x-\mu)^2}{2\sigma^2} + \text{const} \implies \frac{\partial^2}{\partial\mu^2}\log f = -\frac{1}{\sigma^2}

So I1(mu)=1/σ2I_1(\\mu) = 1/\sigma^2 and the Cramér-Rao bound gives Var(mu^)σ2/n\text{Var}(\hat{\\mu}) \geq \sigma^2/n. Since Var(Xˉ)=σ2/n\text{Var}(\bar{X}) = \sigma^2/n, the sample mean achieves the bound — it is the MVUE for mu\\mu.


Python Implementation: Comparing MoM and MLE

import numpy as np
from scipy import stats

np.random.seed(42)
n = 50

# --- Exponential distribution ---
true_lambda = 2.5
data = np.random.exponential(1/true_lambda, size=n)

# MoM estimator
mom_lambda = 1 / np.mean(data)

# MLE (same form for exponential)
mle_lambda = 1 / np.mean(data)  # MoM = MLE for exponential

print(f"Exponential: true λ = {true_lambda}")
print(f"  MoM = MLE = {mle_lambda:.4f}")

# --- Normal distribution ---
true_mu, true_sigma = 5.0, 3.0
data_normal = np.random.normal(true_mu, true_sigma, size=n)

# MoM
mom_mu = np.mean(data_normal)
mom_sigma2 = np.mean(data_normal**2) - mom_mu**2

# MLE
mle_mu = np.mean(data_normal)
mle_sigma2 = np.mean((data_normal - mle_mu)**2)  # biased (divides by n)

# Unbiased
unbiased_sigma2 = np.var(data_normal, ddof=1)  # divides by n-1

print(f"\nNormal: true μ = {true_mu}, σ² = {true_sigma**2}")
print(f"  MoM μ̂ = {mom_mu:.4f}, MLE μ̂ = {mle_mu:.4f}")
print(f"  MoM σ̂² = {mom_sigma2:.4f}, MLE σ̂² = {mle_sigma2:.4f}, Unbiased = {unbiased_sigma2:.4f}")

# --- Demonstrate MLE consistency ---
print(f"\nConsistency demo (MLE σ̂² vs true σ² = {true_sigma**2}):")
for n_small in [10, 50, 200, 1000, 5000]:
    samples = np.random.normal(true_mu, true_sigma, size=n_small)
    mle_var = np.mean((samples - np.mean(samples))**2)
    print(f"  n={n_small:5d}: MLE σ̂² = {mle_var:.4f} (error = {abs(mle_var - true_sigma**2):.4f})")

Key Takeaways

Summary: Point Estimation

  • Method of Moments: match sample moments to population moments; simple but generally less efficient
  • MLE: maximize the likelihood function; asymptotically efficient and consistent
  • MLE for σ2\sigma^2 divides by nn (biased); use n1n-1 for an unbiased estimator
  • Fisher information I(θ)I(\theta) quantifies how much data tells us about θ\theta
  • The Cramér-Rao bound sets the minimum variance for unbiased estimators: Var(θ^)1/(nI1(θ))\text{Var}(\hat{\theta}) \geq 1/(nI_1(\theta))
  • MLE achieves this bound asymptotically, making it the best large-sample estimator

Premium Content

Point Estimation — Estimating Population Parameters

Unlock this lesson and 900+ advanced tutorials with a Premium plan.

🎯End-to-end Projects
💼Interview Prep
📜Certificates
🤝Community Access

Already a member? Log in

Need Expert Statistics Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement