🎉 75% of content is free forever — Unlock Premium from $10/mo →
CW
Search courses…
💼 Servicesℹ️ About✉️ ContactView Pricing Plansfrom $10

Linear Regression: Math, Code and Assumptions

Module 7: Machine Learning FundamentalsLinear Regression🟢 Free Lesson

Advertisement

Linear Regression: Math, Code and Assumptions

The Foundation of Machine Learning

Linear regression is the most fundamental algorithm in ML. Despite its simplicity, understanding it deeply provides insight into all supervised learning methods.

ML Algorithm Landscape

Supervised Learning AlgorithmsLinearLinearRegressionLogisticRegressionRidge/LassoTree-BasedDecisionTreeRandomForestXGBoostNeuralPerceptronMLPDeepLearningSupportLinearSVMKernelSVMSVR

Linear Regression is the foundation — understand this first!


1. Simple Linear Regression

Mathematical Formulation

Model:

y^=β0+β1x+ϵ\hat{y} = \beta_0 + \beta_1 x + \epsilon

Where:

  • β0\beta_0 = intercept (bias) — value of yy when x=0x = 0
  • β1\beta_1 = slope (weight) — change in yy for unit change in xx
  • ϵ\epsilon = error term — ϵN(0,σ2)\epsilon \sim N(0, \sigma^2)
Simple Linear Regression: Finding the Best Fit LineFeature (x)Target (y)eáµ¢eáµ¢eáµ¢eáµ¢eáµ¢eáµ¢βâ‚€ = interceptβ₁ = slopeActual data pointsRegression lineResiduals (errors)

How this diagram works: This diagram shows the core concept of simple linear regression — fitting a straight line through scattered data points to model the relationship between a feature (x) and a target (y). The blue data points represent actual observations, while the purple regression line represents the model's predictions. The red dashed lines (residuals) show the vertical distance between each data point and the line, representing prediction errors. The goal of linear regression is to minimize these residuals by finding the optimal intercept (β₀) and slope (β₁) that produce the smallest total squared error.


2. Cost Function (Ordinary Least Squares)

Mean Squared Error (MSE):

J(β0,β1)=1ni=1n(yiy^i)2=1ni=1n(yi(β0+β1xi))2J(\beta_0, \beta_1) = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2 = \frac{1}{n} \sum_{i=1}^{n} (y_i - (\beta_0 + \beta_1 x_i))^2

Goal: Find β0,β1\beta_0, \beta_1 that minimize JJ

Closed-Form Solution (Normal Equation):

β1=i=1n(xixˉ)(yiyˉ)i=1n(xixˉ)2=Cov(X,Y)Var(X)\beta_1 = \frac{\sum_{i=1}^{n}(x_i - \bar{x})(y_i - \bar{y})}{\sum_{i=1}^{n}(x_i - \bar{x})^2} = \frac{\text{Cov}(X,Y)}{\text{Var}(X)}
β0=yˉβ1xˉ\beta_0 = \bar{y} - \beta_1 \bar{x}
Cost Function: The Bowl-Shaped SurfaceGlobal MinimumGradient DescentGradient Descentβ₁ (slope)J(βâ‚€, β₁)

The cost function is convex — gradient descent finds the global minimum


3. Gradient Descent

Update Rule:

βj:=βjαJβj\beta_j := \beta_j - \alpha \frac{\partial J}{\partial \beta_j}

Partial Derivatives:

Jβ0=2ni=1n(yiy^i)\frac{\partial J}{\partial \beta_0} = -\frac{2}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)
Jβ1=2ni=1n(yiy^i)xi\frac{\partial J}{\partial \beta_1} = -\frac{2}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i) \cdot x_i

Where α\alpha = learning rate (step size)

Gradient Descent: Learning Rate Impactα = 0.1 ✓α = 1.0 ≤ (oscillates)α = 0.001 (too slow)Good learning rateToo largeToo small

4. Multiple Linear Regression

Model:

y^=β0+β1x1+β2x2++βpxp=β0+j=1pβjxj\hat{y} = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \cdots + \beta_p x_p = \beta_0 + \sum_{j=1}^{p} \beta_j x_j

Matrix Form:

y^=Xβ\hat{\mathbf{y}} = \mathbf{X}\boldsymbol{\beta}

Where XRn×(p+1)\mathbf{X} \in \mathbb{R}^{n \times (p+1)} (design matrix with intercept column)

Normal Equation (Matrix):

β^=(XTX)1XTy\boldsymbol{\hat{\beta}} = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{y}
Multiple Regression: Multiple Features → Single Outputx₁(Size)βx₂(Beds)βx₃(Age)βxâ‚„(Baths)βâ‚„LinearModelŷ = βâ‚€ + Σβ⊥x⊥Outputŷ (Price)

5. Model Evaluation Metrics

R² Score (Coefficient of Determination):

R2=1SSresSStot=1i=1n(yiy^i)2i=1n(yiyˉ)2R^2 = 1 - \frac{SS_{res}}{SS_{tot}} = 1 - \frac{\sum_{i=1}^{n}(y_i - \hat{y}_i)^2}{\sum_{i=1}^{n}(y_i - \bar{y})^2}
  • R2=1R^2 = 1: Perfect fit
  • R2=0R^2 = 0: Model predicts the mean
  • R2<0R^2 < 0: Model is worse than predicting the mean

Adjusted R²:

Radj2=1(1R2)(n1)np1R^2_{adj} = 1 - \frac{(1-R^2)(n-1)}{n-p-1}
R² Score: How Well Does the Model Fit?SS_total = Σ(yáµ¢ - ȳ)² = Total VarianceSS_explained = Σ(ŷáµ¢ - ȳ)² = 70%SS_residual = 30%

R² = 1 - (30/100) = 0.70 (70% variance explained)


6. Assumptions of Linear Regression

5 Key Assumptions to Validate1. Linearityy = f(x) is linear2. IndependenceErrors are independent3. HomoscedasticityConstant variance4. Normality of Errorsε ~ N(0, σ²)5. No MulticollinearityX₁ → X₂Features not correlated

Checking Assumptions with Residual Plots

Residual Analysis: What to Look For✓ Good: Random≤ Bad: Funnel≤ Bad: Pattern

7. Implementation in Python

import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score

# Generate sample data
np.random.seed(42)
X = 2 * np.random.rand(100, 1)
y = 4 + 3 * X + np.random.randn(100, 1)

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Fit model
model = LinearRegression()
model.fit(X_train, y_train)

# Predictions
y_pred = model.predict(X_test)

# Evaluate
print(f"Intercept (βâ‚€): {model.intercept_[0]:.4f}")
print(f"Slope (β₁): {model.coef_[0][0]:.4f}")
print(f"R² Score: {r2_score(y_test, y_pred):.4f}")
print(f"RMSE: {np.sqrt(mean_squared_error(y_test, y_pred)):.4f}")

# Visualize
plt.scatter(X_test, y_test, color='blue', alpha=0.6, label='Actual')
plt.plot(X_test, y_pred, color='red', linewidth=2, label='Predicted')
plt.xlabel('Feature')
plt.ylabel('Target')
plt.title('Linear Regression Fit')
plt.legend()
plt.show()

Key Takeaways

  1. Linear regression finds the best-fit line through data points
  2. Cost function (MSE) measures prediction error — minimize it
  3. Gradient descent iteratively updates weights to find minimum
  4. R² score tells you how much variance the model explains
  5. Validate assumptions before trusting the model
  6. Regularization (Ridge/Lasso) prevents overfitting

Next: Logistic Regression

Extend linear regression to classification with the sigmoid function.

Premium Content

Linear Regression: Math, Code and Assumptions

Unlock this lesson and 900+ advanced tutorials with a Premium plan.

🎯End-to-end Projects
💼Interview Prep
📜Certificates
🤝Community Access

Already a member? Log in

Need Expert Data Science Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement