🎉 75% of content is free forever — Unlock Premium from $10/mo →
CW
Search courses…
💼 Servicesℹ️ About✉️ ContactView Pricing Plansfrom $10

Multiple Linear Regression — Theory and Python

Regression AnalysisMultiple Regression🟢 Free Lesson

Advertisement

Multiple Linear Regression

Regression Analysis

Extending Regression to Multiple Predictors

Multiple linear regression models the relationship between a response variable and several predictors simultaneously. It estimates each variable's unique contribution while controlling for others.

  • Real Estate — Predict house prices from size, location, age, and amenities

  • Medicine — Assess treatment effects while controlling for patient demographics

  • Marketing — Quantify individual channel contributions to total sales

Each coefficient tells the story of one variable holding all others constant.


Extends simple regression to multiple predictors:

Multiple Linear Regression Model

Y=β0+β1X1+β2X2++βpXp+εY = \beta_0 + \beta_1X_1 + \beta_2X_2 + \cdots + \beta_pX_p + \varepsilon

Here,

  • YY=Response variable
  • β0\beta_0=Intercept
  • βj\beta_j=Coefficient for predictor X_j
  • XjX_j=j-th predictor variable
  • ε\varepsilon=Error term

import numpy as np

import pandas as pd

import statsmodels.api as sm

from scipy import stats

import matplotlib.pyplot as plt



np.random.seed(42)

n = 200



# House price data: size, bedrooms, age

house_size = np.random.uniform(1000, 3500, n)

bedrooms   = np.random.choice([1,2,3,4,5], n, p=[0.05,0.2,0.4,0.25,0.1])

age        = np.random.uniform(0, 50, n)



price = (50000 + 120*house_size + 8000*bedrooms - 500*age

         + np.random.normal(0, 25000, n))



df = pd.DataFrame({'price':price,'size':house_size,'bedrooms':bedrooms,'age':age})



X = sm.add_constant(df[['size','bedrooms','age']])

model = sm.OLS(df['price'], X).fit()

print(model.summary())



# Interpretation

print("\nCoefficient Interpretation:")

for name, coef, pval in zip(model.params.index, model.params, model.pvalues):

    sig = "***" if pval<0.001 else "**" if pval<0.01 else "*" if pval<0.05 else "ns"

    print(f"  {name:12s}: {coef:>10.2f}  (p={pval:.4f} {sig})")



# F-test: overall model significance

print(f"\nF({model.df_model:.0f},{model.df_resid:.0f}) = {model.fvalue:.2f}, p = {model.f_pvalue:.6f}")

print(f"R² = {model.rsquared:.4f}, Adj R² = {model.rsquared_adj:.4f}")



# Prediction with confidence interval

new_house = pd.DataFrame({'const':1,'size':[2000],'bedrooms':[3],'age':[10]})

pred = model.get_prediction(new_house)

summary = pred.summary_frame(alpha=0.05)

print(f"\nPrediction for 2000 sqft, 3 bed, 10yr old:")

print(f"  Predicted: ${summary['mean'].iloc[0]:,.0f}")

print(f"  95% CI: (${summary['mean_ci_lower'].iloc[0]:,.0f}, ${summary['mean_ci_upper'].iloc[0]:,.0f})")

print(f"  95% PI: (${summary['obs_ci_lower'].iloc[0]:,.0f}, ${summary['obs_ci_upper'].iloc[0]:,.0f})")

Ceteris Paribus Interpretation

Each ß coefficient represents the change in Y per unit change in X?, holding all other predictors constant (ceteris paribus).


Key Takeaways

Summary: Multiple Linear Regression

  • Each ß coefficient = change in Y per unit change in X?, holding all others constant

  • Adjusted R² should be used for model comparison (penalizes extra predictors)

  • F-test tests H0: all ß? = 0 (model has no explanatory power)

  • Interpretation requires ceteris paribus — "all else equal" is the key phrase

  • Check multicollinearity (VIF) before interpreting individual coefficients

Premium Content

Multiple Linear Regression — Theory and Python

Unlock this lesson and 900+ advanced tutorials with a Premium plan.

🎯End-to-end Projects
💼Interview Prep
📜Certificates
🤝Community Access

Already a member? Log in

Need Expert Statistics Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement