Linear Regression: Math, Code and Assumptions

The Foundation of Machine Learning

Linear regression is the most fundamental algorithm in ML. Despite its simplicity, understanding it deeply provides insight into all supervised learning methods.

ML Algorithm Landscape

Linear Regression is the foundation — understand this first!

1. Simple Linear Regression

Mathematical Formulation

Model:

\hat{y} = \beta_0 + \beta_1 x + \epsilon

Where:

$\beta_0$ = intercept (bias) — value of $y$ when $x = 0$
$\beta_1$ = slope (weight) — change in $y$ for unit change in $x$
$\epsilon$ = error term — $\epsilon \sim N(0, \sigma^2)$

How this diagram works: This diagram shows the core concept of simple linear regression — fitting a straight line through scattered data points to model the relationship between a feature (x) and a target (y). The blue data points represent actual observations, while the purple regression line represents the model's predictions. The red dashed lines (residuals) show the vertical distance between each data point and the line, representing prediction errors. The goal of linear regression is to minimize these residuals by finding the optimal intercept (β₀) and slope (β₁) that produce the smallest total squared error.

2. Cost Function (Ordinary Least Squares)

Mean Squared Error (MSE):

J(\beta_0, \beta_1) = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2 = \frac{1}{n} \sum_{i=1}^{n} (y_i - (\beta_0 + \beta_1 x_i))^2

Goal: Find $\beta_0, \beta_1$ that minimize $J$

Closed-Form Solution (Normal Equation):

\beta_1 = \frac{\sum_{i=1}^{n}(x_i - \bar{x})(y_i - \bar{y})}{\sum_{i=1}^{n}(x_i - \bar{x})^2} = \frac{\text{Cov}(X,Y)}{\text{Var}(X)}

\beta_0 = \bar{y} - \beta_1 \bar{x}

The cost function is convex — gradient descent finds the global minimum

3. Gradient Descent

Update Rule:

\beta_j := \beta_j - \alpha \frac{\partial J}{\partial \beta_j}

Partial Derivatives:

\frac{\partial J}{\partial \beta_0} = -\frac{2}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)

\frac{\partial J}{\partial \beta_1} = -\frac{2}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i) \cdot x_i

Where $\alpha$ = learning rate (step size)

4. Multiple Linear Regression

Model:

\hat{y} = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \cdots + \beta_p x_p = \beta_0 + \sum_{j=1}^{p} \beta_j x_j

Matrix Form:

\hat{\mathbf{y}} = \mathbf{X}\boldsymbol{\beta}

Where $\mathbf{X} \in \mathbb{R}^{n \times (p+1)}$ (design matrix with intercept column)

Normal Equation (Matrix):

\boldsymbol{\hat{\beta}} = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{y}

5. Model Evaluation Metrics

R² Score (Coefficient of Determination):

R^2 = 1 - \frac{SS_{res}}{SS_{tot}} = 1 - \frac{\sum_{i=1}^{n}(y_i - \hat{y}_i)^2}{\sum_{i=1}^{n}(y_i - \bar{y})^2}

$R^2 = 1$ : Perfect fit
$R^2 = 0$ : Model predicts the mean
$R^2 < 0$ : Model is worse than predicting the mean

Adjusted R²:

R^2_{adj} = 1 - \frac{(1-R^2)(n-1)}{n-p-1}

R² = 1 - (30/100) = 0.70 (70% variance explained)

6. Assumptions of Linear Regression

Checking Assumptions with Residual Plots

7. Implementation in Python

import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score

# Generate sample data
np.random.seed(42)
X = 2 * np.random.rand(100, 1)
y = 4 + 3 * X + np.random.randn(100, 1)

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Fit model
model = LinearRegression()
model.fit(X_train, y_train)

# Predictions
y_pred = model.predict(X_test)

# Evaluate
print(f"Intercept (βâ‚€): {model.intercept_[0]:.4f}")
print(f"Slope (β₁): {model.coef_[0][0]:.4f}")
print(f"R² Score: {r2_score(y_test, y_pred):.4f}")
print(f"RMSE: {np.sqrt(mean_squared_error(y_test, y_pred)):.4f}")

# Visualize
plt.scatter(X_test, y_test, color='blue', alpha=0.6, label='Actual')
plt.plot(X_test, y_pred, color='red', linewidth=2, label='Predicted')
plt.xlabel('Feature')
plt.ylabel('Target')
plt.title('Linear Regression Fit')
plt.legend()
plt.show()

Key Takeaways

Linear regression finds the best-fit line through data points
Cost function (MSE) measures prediction error — minimize it
Gradient descent iteratively updates weights to find minimum
R² score tells you how much variance the model explains
Validate assumptions before trusting the model
Regularization (Ridge/Lasso) prevents overfitting

Next: Logistic Regression

Extend linear regression to classification with the sigmoid function.

Linear Regression: Math, Code and Assumptions

Linear Regression: Math, Code and Assumptions

The Foundation of Machine Learning

ML Algorithm Landscape

1. Simple Linear Regression

Mathematical Formulation

2. Cost Function (Ordinary Least Squares)

3. Gradient Descent

4. Multiple Linear Regression

5. Model Evaluation Metrics

6. Assumptions of Linear Regression

Checking Assumptions with Residual Plots

7. Implementation in Python

Key Takeaways

Next: Logistic Regression

Premium Content

Need Expert Data Science Help?