🎉 75% of content is free forever — Unlock Premium from $10/mo →
CW
Search courses…
💼 Servicesℹ️ About✉️ ContactView Pricing Plansfrom $10

Simple Linear Regression — Theory, Assumptions, and Python

Regression AnalysisLinear Regression🟢 Free Lesson

Advertisement

Simple Linear Regression

Regression Analysis

Modeling the Relationship Between Two Variables

Simple linear regression quantifies how one variable changes with another, forming the foundation of predictive modeling. It estimates the line that best fits the data by minimizing squared residuals.

  • Economics — Predict consumer spending based on income levels

  • Healthcare — Model the relationship between dosage and patient response

  • Engineering — Relate temperature to material expansion coefficients

Every complex model begins with understanding a single straight line.


Simple linear regression models the linear relationship between a predictor variable XX and a response variable YY.


The Statistical Model

DfSimple Linear Regression Model

Yi=β0+β1Xi+εi,i=1,2,,nY_i = \beta_0 + \beta_1 X_i + \varepsilon_i, \quad i = 1, 2, \ldots, n

where:

  • β0\beta_0 is the intercept (predicted YY when X=0X = 0)

  • β1\beta_1 is the slope (change in YY per unit change in XX)

  • εi\varepsilon_i is the error term — the part of YY not explained by XX

The Error Term

The error εi\varepsilon_i represents the combined effect of all unmeasured factors. Under the classical assumptions, εiN(0,σ2)\varepsilon_i \sim \mathcal{N}(0, \sigma^2) i.i.d. This implies E[YiXi]=β0+β1XiE[Y_i \mid X_i] = \beta_0 + \beta_1 X_i — the conditional mean is linear in XX.


Ordinary Least Squares (OLS) Estimation

OLS finds the estimates β^0,β^1\hat{\beta}_0, \hat{\beta}_1 that minimize the sum of squared residuals:

Sum of Squared Residuals

SSR=i=1nei2=i=1n(yiy^i)2=i=1n(yiβ^0β^1xi)2\text{SSR} = \sum_{i=1}^n e_i^2 = \sum_{i=1}^n (y_i - \hat{y}_i)^2 = \sum_{i=1}^n (y_i - \hat{\beta}_0 - \hat{\beta}_1 x_i)^2

Here,

  • eie_i=Residual: observed minus predicted
  • y^i\hat{y}_i=Predicted value at x?
  • SSRSSR=Sum of squared residuals

ThOLS Closed-Form Solution

Setting SSRβ0=0\frac{\partial \text{SSR}}{\partial \beta_0} = 0 and SSRβ1=0\frac{\partial \text{SSR}}{\partial \beta_1} = 0 yields:

β^1=i=1n(xixˉ)(yiyˉ)i=1n(xixˉ)2=SxySxx\hat{\beta}_1 = \frac{\sum_{i=1}^n (x_i - \bar{x})(y_i - \bar{y})}{\sum_{i=1}^n (x_i - \bar{x})^2} = \frac{S_{xy}}{S_{xx}}
β^0=yˉβ^1xˉ\hat{\beta}_0 = \bar{y} - \hat{\beta}_1 \bar{x}

where Sxy=(xixˉ)(yiyˉ)S_{xy} = \sum(x_i - \bar{x})(y_i - \bar{y}) and Sxx=(xixˉ)2S_{xx} = \sum(x_i - \bar{x})^2.

Geometric Interpretation

The OLS estimator is the orthogonal projection of the observed vector y\mathbf{y} onto the column space of the design matrix X\mathbf{X}. The residuals e=yy^\mathbf{e} = \mathbf{y} - \hat{\mathbf{y}} are orthogonal to the fitted values — they lie in the null space of XT\mathbf{X}^T.


Properties of OLS Estimators

ThGauss–Markov Theorem

Under the classical assumptions (linearity, i.i.d. errors with mean 0, homoscedasticity, no autocorrelation), the OLS estimators are BLUE (Best Linear Unbiased Estimators):

Var(β^1)=σ2(xixˉ)2=σ2Sxx\text{Var}(\hat{\beta}_1) = \frac{\sigma^2}{\sum(x_i - \bar{x})^2} = \frac{\sigma^2}{S_{xx}}

No other linear unbiased estimator of β1\beta_1 has smaller variance.

Variance of OLS Estimators

Var(β^0)=σ2(1n+xˉ2Sxx),Var(β^1)=σ2Sxx\text{Var}(\hat{\beta}_0) = \sigma^2 \left(\frac{1}{n} + \frac{\bar{x}^2}{S_{xx}}\right), \quad \text{Var}(\hat{\beta}_1) = \frac{\sigma^2}{S_{xx}}

Here,

  • σ2\sigma^2=Error variance (estimated by $s^2 = SSR/(n-2)$)
  • SxxS_{xx}=Sum of squared deviations of X
  • nn=Sample size

The Coefficient of Determination (R2R^2)

R-Squared

R2=1SSRSST=SSRregressionSST=1(yiy^i)2(yiyˉ)2R^2 = 1 - \frac{\text{SSR}}{\text{SST}} = \frac{\text{SSR}_{\text{regression}}}{\text{SST}} = 1 - \frac{\sum(y_i - \hat{y}_i)^2}{\sum(y_i - \bar{y})^2}

Here,

  • R2R^2=Proportion of variance in Y explained by X
  • SSRSSR=Sum of squared residuals
  • SSTSST=Total sum of squares = S(y? - ?)²

R2[0,1]R^2 \in [0, 1]. An R2=0.75R^2 = 0.75 means 75% of the variability in YY is explained by the linear relationship with XX.


The Four Regression Assumptions (LINE)

| Assumption | Mathematical Statement | Diagnostic Check |

|-----------|----------------------|------------------|

| Linearity | E[YX]=β0+β1XE[Y \mid X] = \beta_0 + \beta_1 X | Scatter plot; residual vs. fitted plot |

| Independence | Cov(εi,εj)=0\text{Cov}(\varepsilon_i, \varepsilon_j) = 0 for iji \neq j | Study design; Durbin–Watson test |

| Normality | εiN(0,σ2)\varepsilon_i \sim \mathcal{N}(0, \sigma^2) | Q–Q plot; Shapiro–Wilk test |

| Equal variance | Var(εi)=σ2\text{Var}(\varepsilon_i) = \sigma^2 (constant) | Residual vs. fitted plot; Breusch–Pagan test |

Consequence of Violations

  • Non-linearity: OLS estimates are biased; fit a polynomial or use non-linear regression.

  • Non-independence: Standard errors are wrong; use clustered or time-series methods.

  • Non-normality: Affects small-sample inference; CLT helps for large nn.

  • Heteroscedasticity: Standard errors are wrong; use robust (HC) standard errors.


Hypothesis Tests for the Slope

t-Test for Slope

t=β^10SE(β^1)tn2t = \frac{\hat{\beta}_1 - 0}{\text{SE}(\hat{\beta}_1)} \sim t_{n-2}

Here,

  • β^1\hat{\beta}_1=Estimated slope
  • SE(β^1)SE(\hat{\beta}_1)=Standard error of the slope
  • n2n-2=Degrees of freedom

The tt-test with H0:β1=0H_0: \beta_1 = 0 is equivalent to the FF-test in simple regression: F=t2F = t^2.


Prediction Intervals

For a new observation at x0x_0, the prediction interval is wider than the confidence interval for the mean:

Prediction Interval

y^0±tα/2,n2s1+1n+(x0xˉ)2Sxx\hat{y}_0 \pm t_{\alpha/2, n-2} \cdot s \sqrt{1 + \frac{1}{n} + \frac{(x_0 - \bar{x})^2}{S_{xx}}}

Here,

  • y^0\hat{y}_0=Predicted value at x0
  • ss=Residual standard error
  • 1+1 +=Extra term accounts for individual prediction uncertainty

The +1+1 under the square root makes the prediction interval always wider than the confidence interval for the mean response.


Key Takeaways

Summary: Simple Linear Regression

  • β1\beta_1 is the slope — the change in YY per unit change in XX

  • R2R^2 = proportion of variance in YY explained by XX — ranges from 0 to 1

  • OLS estimates are BLUE (Best Linear Unbiased Estimators) under the Gauss–Markov theorem

  • Always plot residuals — patterns indicate assumption violations

  • Correlation \neq causation — regression shows linear association, not causal direction

  • 95% prediction intervals are always wider than confidence intervals for the mean response

  • The tt-test for the slope is equivalent to the FF-test in simple regression

Premium Content

Simple Linear Regression — Theory, Assumptions, and Python

Unlock this lesson and 900+ advanced tutorials with a Premium plan.

🎯End-to-end Projects
💼Interview Prep
📜Certificates
🤝Community Access

Already a member? Log in

Need Expert Statistics Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement