Polynomial Regression
Regression Analysis
Fitting Nonlinear Relationships With Linear Methods
Polynomial regression captures curved relationships by adding powers of X as predictors while keeping the model linear in its coefficients. It bridges the gap between simple linear models and complex nonlinear patterns.
-
Pharmacology — Model dose-response curves with diminishing returns
-
Environmental Science — Capture temperature effects on species populations
-
Manufacturing — Relate process parameters to quality with nonlinear response surfaces
Adding polynomial terms lets straight lines bend to follow the data's true shape.
Polynomial regression models nonlinear relationships by including powers of X as predictors, while remaining a linear model in the coefficients:
Polynomial Regression Model
Here,
- =Response variable
- =Predictor variable
- =Coefficient for X^j
- =Degree of the polynomial
- =Error term
import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.pipeline import Pipeline
from sklearn.model_selection import cross_val_score
import warnings; warnings.filterwarnings('ignore')
np.random.seed(42)
n = 80
X = np.linspace(-3, 3, n)
y = 0.5*X**3 - X**2 + 2*X + np.random.normal(0, 1.5, n)
X_2d = X.reshape(-1, 1)
X_plot = np.linspace(-3.2, 3.2, 300).reshape(-1, 1)
fig, axes = plt.subplots(2, 3, figsize=(15, 8))
degrees = [1, 2, 3, 5, 10, 20]
colors = ['blue','green','red','orange','purple','brown']
cv_scores = {}
for ax, deg, col in zip(axes.flat, degrees, colors):
model = Pipeline([('poly', PolynomialFeatures(deg)),
('lin', LinearRegression())])
model.fit(X_2d, y)
y_pred = model.predict(X_plot)
# Cross-validated R²
cv_r2 = cross_val_score(model, X_2d, y, cv=5, scoring='r2').mean()
train_r2 = model.score(X_2d, y)
cv_scores[deg] = cv_r2
ax.scatter(X, y, alpha=0.4, s=20, color='gray')
ax.plot(X_plot, y_pred, col, linewidth=2)
ax.set_ylim(-25, 25)
ax.set_title(f'Degree {deg}\nTrain R²={train_r2:.3f}, CV R²={cv_r2:.3f}')
if deg == 3:
ax.set_title(f'Degree {deg} <- CORRECT\nTrain R²={train_r2:.3f}, CV R²={cv_r2:.3f}')
plt.suptitle('Polynomial Regression: Underfitting -> Overfitting', fontsize=14)
plt.tight_layout()
plt.savefig('polynomial_regression.png', dpi=150)
plt.show()
print("Cross-Validated R² by Degree:")
for deg, cv in cv_scores.items():
bar = '#' * max(0, int(cv*20))
print(f" Degree {deg:2d}: {cv:.4f} {bar}")
print("Peak CV R² indicates optimal degree")
Overfitting Warning
Higher degree polynomials are more flexible but risk overfitting. Always use cross-validation to select the optimal degree.
Key Takeaways
Summary: Polynomial Regression
-
Polynomial regression is still linear — in the parameters ß
-
Higher degree = more flexible but risks overfitting
-
Use cross-validation to select the optimal polynomial degree
-
Center and scale X before computing powers to reduce numerical instability
-
Splines are usually better than high-degree polynomials for flexible fitting