From Scatter Plots to Predictions — The Simplest ML Algorithm
Linear regression finds the best straight line through your data. It is fast, interpretable, and a powerful baseline for any regression problem.
- Ordinary Least Squares — The closed-form solution for optimal parameters
- Gradient Descent — The iterative optimization approach that scales
- Evaluation Metrics — R², MSE, and MAE for measuring performance
"All models are wrong, but some are useful." — George Box
Linear Regression — Complete Guide
Linear regression is the simplest and most fundamental ML algorithm. It models the relationship between variables as a straight line.
Simple Linear Regression
DfLinear Regression
Given training data where and , linear regression seeks parameters (slope) and (intercept) that minimize the sum of squared residuals:
Simple Linear Regression
Here,
- =Predicted value
- =Input feature
- =Slope (weight)
- =Y-intercept (bias)
Example: House Prices
For predicting house prices: = price, = square footage, = price/sqft, = base price. For a house with 2000 sq ft, if and :
Finding the Best Line
Ordinary Least Squares (OLS)
DfNormal Equation
The OLS closed-form solution for multiple linear regression with design matrix and target :
This minimizes .
Computational Cost
The normal equation requires computing , which is . For high-dimensional data (), gradient descent is preferred at per iteration.
Gradient Descent
DfGradient Descent for Linear Regression
Initialize , then iterate:
Cost Function Surface and Gradient Descent Path
Multiple Linear Regression
Multiple Linear Regression
Here,
- =Input feature vector
- =Weight vector
- =Bias term
Matrix Form
In matrix notation: where (with bias column of 1s). This is the foundation for all linear models and neural networks.
Evaluation Metrics
R-squared
Here,
- =Coefficient of determination (0 to 1)
- =Sum of squared residuals
- =Total sum of squares
Adjusted R²
For multiple regression with features: . This penalizes adding features that don't improve the model.
Assumptions
Critical Assumptions (Gauss-Markov Theorem)
Linear regression assumes BLUE (Best Linear Unbiased Estimator): (1) Linearity, (2) Independence of errors, (3) Homoscedasticity (constant variance), (4) Normality of residuals, (5) No multicollinearity.
Polynomial Regression
DfPolynomial Regression
Extends linear regression by adding polynomial terms: . Despite the nonlinearity in , it is still linear in the parameters , so OLS applies after feature transformation.
from sklearn.preprocessing import PolynomialFeatures
poly = PolynomialFeatures(degree=2)
X_poly = poly.fit_transform(X)
model = LinearRegression().fit(X_poly, y)
Key Takeaways
Summary: Linear Regression
- Linear regression finds the best by minimizing
- OLS gives closed-form ; gradient descent is iterative
- The loss surface is convex — gradient descent finds the global minimum
- R² measures proportion of variance explained:
- Check assumptions (linearity, normality, homoscedasticity, independence)
- Polynomial regression extends to nonlinear relationships
- Regularization (Ridge L2, Lasso L1) prevents overfitting in high dimensions
- Linear regression is fast, interpretable, and a great baseline
What to Learn Next
-> Logistic Regression Classification with probability — from linear to sigmoid.
-> Regularization Prevent overfitting with Ridge, Lasso, and Elastic Net.
-> Model Evaluation How to know if your model actually works — beyond accuracy.