🎉 75% of content is free forever — Unlock Premium from $10/mo →
CW
Search courses…
💼 Servicesℹ️ About✉️ ContactView Pricing Plansfrom $10

OLS Estimation — Deriving Regression Coefficients from Scratch

Regression AnalysisLinear Regression🟢 Free Lesson

Advertisement

OLS Estimation: From First Principles

Regression Analysis

The Math Behind Regression Coefficients

Ordinary Least Squares finds the coefficient vector that minimizes the sum of squared residuals. Understanding OLS from first principles reveals why regression works and when it breaks down.

  • Data Science — Build foundation for understanding regularization and advanced estimators

  • Econometrics — Derive the Gauss-Markov theorem and BLUE properties

  • Actuarial Science — Implement premium models with transparent coefficient derivation

The normal equations transform data into the best linear unbiased estimates.


Ordinary Least Squares (OLS) is the foundation of linear regression. It finds the coefficient vector β^\hat{\boldsymbol{\beta}} that minimizes the sum of squared residuals.


Matrix Formulation

DfThe Linear Model in Matrix Notation

y=Xβ+ε\mathbf{y} = \mathbf{X}\boldsymbol{\beta} + \boldsymbol{\varepsilon}

where:

  • y\mathbf{y} is the n×1n \times 1 response vector

  • X\mathbf{X} is the n×pn \times p design matrix (first column is 1s for the intercept)

  • β\boldsymbol{\beta} is the p×1p \times 1 coefficient vector

  • εN(0,σ2In)\boldsymbol{\varepsilon} \sim \mathcal{N}(\mathbf{0}, \sigma^2 \mathbf{I}_n) is the error vector


Derivation of the Normal Equations

ThOLS — Normal Equations

The OLS estimator minimizes:

SSR=εTε=(yXβ)T(yXβ)\text{SSR} = \boldsymbol{\varepsilon}^T \boldsymbol{\varepsilon} = (\mathbf{y} - \mathbf{X}\boldsymbol{\beta})^T(\mathbf{y} - \mathbf{X}\boldsymbol{\beta})

Expanding:

SSR=yTy2βTXTy+βTXTXβ\text{SSR} = \mathbf{y}^T\mathbf{y} - 2\boldsymbol{\beta}^T\mathbf{X}^T\mathbf{y} + \boldsymbol{\beta}^T\mathbf{X}^T\mathbf{X}\boldsymbol{\beta}

Taking the derivative with respect to β\boldsymbol{\beta} and setting it to zero:

SSRβ=2XTy+2XTXβ=0\frac{\partial \text{SSR}}{\partial \boldsymbol{\beta}} = -2\mathbf{X}^T\mathbf{y} + 2\mathbf{X}^T\mathbf{X}\boldsymbol{\beta} = \mathbf{0}

This gives the normal equations:

XTXβ^=XTy\mathbf{X}^T\mathbf{X}\hat{\boldsymbol{\beta}} = \mathbf{X}^T\mathbf{y}

If XTX\mathbf{X}^T\mathbf{X} is invertible (which requires rank(X)=p\text{rank}(\mathbf{X}) = p):

β^=(XTX)1XTy\hat{\boldsymbol{\beta}} = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{y}

Proof that this is a minimum: The Hessian is 2SSRββT=2XTX\frac{\partial^2 \text{SSR}}{\partial \boldsymbol{\beta} \partial \boldsymbol{\beta}^T} = 2\mathbf{X}^T\mathbf{X}, which is positive semi-definite (positive definite if X\mathbf{X} has full column rank). \square


The Hat Matrix

Hat Matrix (Projection Matrix)

y^=Xβ^=X(XTX)1XTy=Hy\hat{\mathbf{y}} = \mathbf{X}\hat{\boldsymbol{\beta}} = \mathbf{X}(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{y} = \mathbf{H}\mathbf{y}

Here,

  • H\mathbf{H}== X(X?X)?¹X? — the hat matrix
  • y^\hat{\mathbf{y}}=Fitted (predicted) values

H\mathbf{H} is an orthogonal projection matrix: H2=H\mathbf{H}^2 = \mathbf{H} and HT=H\mathbf{H}^T = \mathbf{H}. It projects y\mathbf{y} onto the column space of X\mathbf{X}. The residuals e=(IH)y\mathbf{e} = (\mathbf{I} - \mathbf{H})\mathbf{y} lie in the orthogonal complement.


Properties of OLS Estimators

ThGauss–Markov Theorem

Under the classical assumptions (E[ε]=0E[\boldsymbol{\varepsilon}] = \mathbf{0}, Var(ε)=σ2I\text{Var}(\boldsymbol{\varepsilon}) = \sigma^2\mathbf{I}, X\mathbf{X} fixed), the OLS estimator β^\hat{\boldsymbol{\beta}} is BLUE (Best Linear Unbiased Estimator):

Var(β^)=σ2(XTX)1\text{Var}(\hat{\boldsymbol{\beta}}) = \sigma^2(\mathbf{X}^T\mathbf{X})^{-1}

No other linear unbiased estimator of β\boldsymbol{\beta} has smaller variance.

ThUnbiasedness of $\hat{\boldsymbol{\beta}}$

E[β^]=E[(XTX)1XTy]=(XTX)1XTE[y]=(XTX)1XTXβ=βE[\hat{\boldsymbol{\beta}}] = E[(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{y}] = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T E[\mathbf{y}] = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{X}\boldsymbol{\beta} = \boldsymbol{\beta}

Estimation of Error Variance

Unbiased Estimator of Error Variance

σ^2=s2=eTenp=SSRnp\hat{\sigma}^2 = s^2 = \frac{\mathbf{e}^T\mathbf{e}}{n - p} = \frac{SSR}{n - p}

Here,

  • s2s^2=Estimated error variance
  • npn - p=Degrees of freedom
  • e\mathbf{e}=Residual vector

E[s2]=σ2E[s^2] = \sigma^2 — this is unbiased. The denominator npn - p accounts for the pp parameters estimated.


Numerical Considerations

Avoid Matrix Inversion

In practice, never compute (XTX)1(\mathbf{X}^T\mathbf{X})^{-1} directly. Instead, use:

  • QR decomposition: X=QR\mathbf{X} = \mathbf{Q}\mathbf{R}, then solve Rβ^=QTy\mathbf{R}\hat{\boldsymbol{\beta}} = \mathbf{Q}^T\mathbf{y}

  • Cholesky decomposition: XTX=LLT\mathbf{X}^T\mathbf{X} = \mathbf{L}\mathbf{L}^T, then solve via forward/back substitution

  • SVD: Most numerically stable, handles rank-deficient cases

These methods are O(np2)O(np^2) and avoid the numerical instability of explicit matrix inversion.


Key Takeaways

Summary: OLS Estimation

  • Normal equations: XTXβ^=XTy\mathbf{X}^T\mathbf{X}\hat{\boldsymbol{\beta}} = \mathbf{X}^T\mathbf{y}, giving β^=(XTX)1XTy\hat{\boldsymbol{\beta}} = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{y}

  • Gauss–Markov: OLS is BLUE under the classical assumptions

  • Variance of β^\hat{\boldsymbol{\beta}}: σ2(XTX)1\sigma^2(\mathbf{X}^T\mathbf{X})^{-1} — depends on the design matrix

  • Error variance: σ^2=SSR/(np)\hat{\sigma}^2 = SSR/(n-p) — unbiased with npn-p degrees of freedom

  • The hat matrix H=X(XTX)1XT\mathbf{H} = \mathbf{X}(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T projects y\mathbf{y} onto the column space of X\mathbf{X}

  • Use QR or SVD, not explicit matrix inversion, for numerical stability

Premium Content

OLS Estimation — Deriving Regression Coefficients from Scratch

Unlock this lesson and 900+ advanced tutorials with a Premium plan.

🎯End-to-end Projects
💼Interview Prep
📜Certificates
🤝Community Access

Already a member? Log in

Need Expert Statistics Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement