Bayesian Linear Regression
Advanced Statistical Methods
Regression With Full Uncertainty Quantification
Bayesian linear regression places prior distributions on regression coefficients and computes posterior distributions that capture full parameter uncertainty. Credible intervals provide direct probabilistic statements about parameters.
- Clinical trials β Incorporate prior knowledge and quantify uncertainty in treatment effects
- Engineering β Predict system performance with honest uncertainty bounds for safety-critical decisions
- Economics β Combine historical data with current observations for more stable parameter estimates
Bayesian regression replaces point estimates with complete probability distributions over parameters.
Bayesian linear regression extends classical OLS by placing probability distributions on the regression coefficients and error variance, yielding full posterior distributions rather than point estimates.
The Bayesian Regression Model
DfBayesian Linear Regression Model
The Bayesian approach treats and as random variables with prior distributions, updating them via Bayes' theorem to obtain posterior distributions given the data .
Posterior Distribution
Here,
- =Likelihood: product of Gaussian densities
- =Joint prior on coefficients and variance
- =Marginal likelihood (evidence) β normalizing constant
Why Bayesian Regression?
The Bayesian approach provides: (1) full uncertainty quantification over parameters, (2) natural incorporation of prior knowledge, (3) exact finite-sample posterior distributions (no reliance on asymptotic approximations), and (4) coherent model comparison via marginal likelihoods.
Conjugate Prior: Normal-Inverse-Gamma
DfNormal-Inverse-Gamma Prior
For the linear model , the conjugate prior is:
where is the prior precision matrix, is the prior mean, and control the prior on .
ThPosterior under Normal-Inverse-Gamma Prior
Given observations, the posterior is also Normal-Inverse-Gamma:
with updated parameters:
Prior Elicitation
Choosing encodes prior confidence. A large means strong prior belief β the posterior will be dominated by the prior. Weakly informative priors (small ) let the data speak. In practice, with large is common.
Posterior Inference for Coefficients
Marginal Posterior of Ξ²
Here,
- =Posterior mean β weighted average of prior mean and OLS estimate
- =Posterior scale matrix
- =Degrees of freedom of the marginal t-distribution
The marginal posterior of is a multivariate -distribution. For large , this approaches a Gaussian:
Large-Sample Posterior
Here,
- =Ordinary least squares estimate
- =Residual variance estimate SSR/(n-p)
Connection to Frequentist
With a flat (improper) prior , the Bayesian posterior mean equals the OLS estimate, and the posterior variance equals the frequentist sampling variance. The Bayesian credible interval with appropriate coverage coincides with the frequentist confidence interval.
Credible Intervals vs. Confidence Intervals
DfCredible Interval
A credible interval for a parameter is an interval such that:
This is a direct probability statement about the parameter β unlike the frequentist confidence interval.
| Property | Credible Interval | Confidence Interval |
|---|---|---|
| Interpretation | over repeated samples | |
| Depends on | Posterior + data | Sampling distribution only |
| Prior information | Yes | No |
| Finite-sample exactness | Yes (with correct prior) | Only asymptotically |
Highest Posterior Density (HPD)
The equal-tailed credible interval uses quantiles of the posterior. The HPD interval is the narrowest interval containing probability β it is unique and preferentially used when the posterior is skewed.
Bayesian Prediction
Posterior Predictive Distribution
Here,
- =New observation at xβ
- =Sampling model (Gaussian)
- =Posterior over parameters
Under the Normal-Inverse-Gamma prior, the posterior predictive is a -distribution:
The inside the parentheses accounts for parameter uncertainty β this is wider than the OLS prediction interval when the prior is weak and is small.
Bayesian Model Comparison
Bayes Factor
Here,
- =Marginal likelihood (evidence) of model k
- =Evidence ratio: how much more likely data under Mβ vs Mβ
Occam's Razor Built In
The Bayes factor naturally penalizes complex models because the marginal likelihood integrates over the entire prior β models with many parameters spread their prior mass thinly, incurring an automatic penalty. This is the Bayesian implementation of Occam's razor.
Python Implementation
import numpy as np
import pymc as pm
import arviz as az
import matplotlib.pyplot as plt
from scipy import stats
np.random.seed(42)
# --- Generate synthetic data ---
n = 80
X = np.random.randn(n, 2)
X = np.column_stack([np.ones(n), X]) # Add intercept
true_beta = np.array([1.5, -2.0, 0.8])
sigma_true = 1.5
y = X @ true_beta + np.random.randn(n) * sigma_true
# --- Bayesian Regression with PyMC ---
with pm.Model() as bayes_reg:
# Priors
beta = pm.Normal('beta', mu=0, sigma=10, shape=3)
sigma = pm.HalfNormal('sigma', sigma=5)
# Likelihood
mu = pm.math.dot(X, beta)
y_obs = pm.Normal('y_obs', mu=mu, sigma=sigma, observed=y)
# Posterior sampling
trace = pm.sample(2000, tune=1000, chains=4, return_inferencedata=True)
# --- Posterior summaries ---
print(az.summary(trace, var_names=['beta', 'sigma']))
# --- Credible intervals ---
for i, name in enumerate(['Intercept', 'X1', 'X2']):
post_samples = trace.posterior['beta'].values[:, :, i].flatten()
ci = np.percentile(post_samples, [2.5, 97.5])
print(f"{name}: {ci[0]:.3f} to {ci[1]:.3f} (true={true_beta[i]:.3f})")
# --- Trace plots ---
az.plot_trace(trace, var_names=['beta', 'sigma'])
plt.tight_layout()
plt.savefig('bayesian_regression_trace.png', dpi=150)
plt.show()
# --- Posterior predictive ---
with bayes_reg:
ppc = pm.sample_posterior_predictive(trace)
y_pred = ppc.posterior_predictive['y_obs'].values.reshape(-1, n)
y_mean = y_pred.mean(axis=0)
y_low = np.percentile(y_pred, 2.5, axis=0)
y_high = np.percentile(y_pred, 97.5, axis=0)
plt.figure(figsize=(10, 6))
order = np.argsort(X[:, 1])
plt.scatter(X[:, 1], y, alpha=0.5, label='Observed')
plt.plot(X[order, 1], y_mean[order], 'r-', label='Posterior mean')
plt.fill_between(X[order, 1], y_low[order], y_high[order], alpha=0.3, label='95% credible band')
plt.xlabel('Xβ')
plt.ylabel('y')
plt.title('Bayesian Linear Regression β Posterior Predictive')
plt.legend()
plt.savefig('bayesian_regression_ppc.png', dpi=150)
plt.show()
Flat Prior Recovery
With flat priors (), the Bayesian posterior mean recovers the OLS estimate exactly, and the 95% credible intervals approximate the frequentist confidence intervals for large .
Related Topics
- See Simple Linear Regression for classical OLS foundations
- See Hypothesis Testing for frequentist inference
- See Bayesian Statistics for the Bayesianβfrequentist debate
- See Hierarchical Bayesian Models for multilevel extensions
- See MCMC Diagnostics for ensuring convergence of posterior samples
Key Takeaways
Summary: Bayesian Linear Regression
- Bayesian regression treats as random β the posterior encodes all uncertainty about the coefficients
- Conjugate Normal-Inverse-Gamma prior yields tractable -distributed posteriors for and Inverse-Gamma for
- Credible intervals have a direct probabilistic interpretation:
- Bayesian prediction integrates over parameter uncertainty β prediction intervals are naturally wider for small samples
- Bayes factors provide automatic model comparison with built-in Occam's razor
- With flat priors, Bayesian and frequentist results coincide: posterior mean = OLS, credible interval β confidence interval
- Weakly informative priors are preferred in practice β they regularize without dominating the likelihood