OLS Estimation: From First Principles
Regression Analysis
The Math Behind Regression Coefficients
Ordinary Least Squares finds the coefficient vector that minimizes the sum of squared residuals. Understanding OLS from first principles reveals why regression works and when it breaks down.
-
Data Science — Build foundation for understanding regularization and advanced estimators
-
Econometrics — Derive the Gauss-Markov theorem and BLUE properties
-
Actuarial Science — Implement premium models with transparent coefficient derivation
The normal equations transform data into the best linear unbiased estimates.
Ordinary Least Squares (OLS) is the foundation of linear regression. It finds the coefficient vector that minimizes the sum of squared residuals.
Matrix Formulation
DfThe Linear Model in Matrix Notation
where:
-
is the response vector
-
is the design matrix (first column is 1s for the intercept)
-
is the coefficient vector
-
is the error vector
Derivation of the Normal Equations
ThOLS — Normal Equations
The OLS estimator minimizes:
Expanding:
Taking the derivative with respect to and setting it to zero:
This gives the normal equations:
If is invertible (which requires ):
Proof that this is a minimum: The Hessian is , which is positive semi-definite (positive definite if has full column rank).
The Hat Matrix
Hat Matrix (Projection Matrix)
Here,
- == X(X?X)?¹X? — the hat matrix
- =Fitted (predicted) values
is an orthogonal projection matrix: and . It projects onto the column space of . The residuals lie in the orthogonal complement.
Properties of OLS Estimators
ThGauss–Markov Theorem
Under the classical assumptions (, , fixed), the OLS estimator is BLUE (Best Linear Unbiased Estimator):
No other linear unbiased estimator of has smaller variance.
ThUnbiasedness of $\hat{\boldsymbol{\beta}}$
Estimation of Error Variance
Unbiased Estimator of Error Variance
Here,
- =Estimated error variance
- =Degrees of freedom
- =Residual vector
— this is unbiased. The denominator accounts for the parameters estimated.
Numerical Considerations
Avoid Matrix Inversion
In practice, never compute directly. Instead, use:
-
QR decomposition: , then solve
-
Cholesky decomposition: , then solve via forward/back substitution
-
SVD: Most numerically stable, handles rank-deficient cases
These methods are and avoid the numerical instability of explicit matrix inversion.
Key Takeaways
Summary: OLS Estimation
-
Normal equations: , giving
-
Gauss–Markov: OLS is BLUE under the classical assumptions
-
Variance of : — depends on the design matrix
-
Error variance: — unbiased with degrees of freedom
-
The hat matrix projects onto the column space of
-
Use QR or SVD, not explicit matrix inversion, for numerical stability