🎉 75% of content is free forever — Unlock Premium from $10/mo →
CW
Search courses…
💼 Servicesℹ️ About✉️ ContactView Pricing Plansfrom $10

Quantile Regression — Beyond the Mean

Foundations of StatisticsRegression Analysis🟢 Free Lesson

Advertisement

Quantile Regression

Foundations of Statistics

Beyond the Mean: Modeling the Full Distribution

While OLS estimates the conditional mean, quantile regression estimates any conditional quantile — revealing how the entire distribution shifts with predictors. It's robust to outliers and captures effects that means miss.

  • Wage Analysis — Understand how factors affect low, median, and high earners differently

  • Healthcare — Model predictors of extreme recovery times, not just average outcomes

  • Climate Science — Study factors driving extreme temperature events beyond average conditions

The median tells a different story than the mean — and sometimes it's the more important one.


OLS estimates the conditional mean. Quantile regression estimates any conditional quantile — revealing how the entire distribution, not just the average, shifts with predictors.

DfQuantile Regression

A regression method that estimates the conditional quantiles of the response variable, providing a more complete picture of how predictors affect the entire distribution.

The t-th quantile minimizes:

Quantile Loss Function

i:yiy^iτyiy^i+i:yi<y^i(1τ)yiy^i\sum_{i: y_i \geq \hat{y}_i} \tau|y_i - \hat{y}_i| + \sum_{i: y_i < \hat{y}_i} (1-\tau)|y_i - \hat{y}_i|

Here,

  • τ\tau=Quantile level (0 < t < 1)
  • yiy_i=Observed value
  • y^i\hat{y}_i=Predicted value

import numpy as np

import pandas as pd

import statsmodels.formula.api as smf

import matplotlib.pyplot as plt



np.random.seed(42)

n = 300

education = np.random.randint(10, 20, n)

# Variance grows with education (income inequality widens)

wage = np.exp(1 + 0.08*education + np.random.normal(0, 0.3 + 0.05*education, n))



df = pd.DataFrame({'wage': wage, 'education': education})



quantiles = [0.10, 0.25, 0.50, 0.75, 0.90]

qr_results = {q: smf.quantreg('wage ~ education', df).fit(q=q) for q in quantiles}



print("Education coefficient by quantile:")

for q, r in qr_results.items():

    coef = r.params['education']

    ci = r.conf_int().loc['education']

    print(f"  Q({q:.2f}): {coef:.4f} [{ci.iloc[0]:.4f}, {ci.iloc[1]:.4f}]")



# OLS for comparison

ols = smf.ols('wage ~ education', df).fit()

print(f"  OLS:     {ols.params['education']:.4f}")



# Visualization

fig, ax = plt.subplots(figsize=(10, 6))

educ_range = np.linspace(10, 20, 100)

new_data = pd.DataFrame({'education': educ_range})

colors = ['lightblue','steelblue','navy','coral','red']



for (q, r), col in zip(qr_results.items(), colors):

    ax.plot(educ_range, r.predict(new_data), color=col, linewidth=2, label=f'Q({q})')

ax.plot(educ_range, ols.predict(new_data), 'k--', linewidth=2.5, label='OLS mean')

ax.scatter(education, wage, alpha=0.2, s=15, color='gray')

ax.set_xlabel('Education (years)')

ax.set_ylabel('Wage ($)')

ax.set_title('Quantile Regression: Wage-Education Relationship')

ax.legend()

plt.tight_layout()

plt.savefig('quantile_regression.png', dpi=150)

plt.show()

Median Regression

The median regression (t = 0.5) is robust to outliers because it minimizes absolute deviations rather than squared deviations.


Key Takeaways

Summary: Quantile Regression

  • Quantile regression reveals heterogeneous effects — different for low vs high earners

  • t = 0.5 gives median regression — robust to outliers (LAD estimator)

  • No distributional assumption needed

  • Applications: income inequality, growth charts, risk models (VaR)

  • statsmodels.formula.api.quantreg fits quantile regression in Python

Premium Content

Quantile Regression — Beyond the Mean

Unlock this lesson and 900+ advanced tutorials with a Premium plan.

🎯End-to-end Projects
💼Interview Prep
📜Certificates
🤝Community Access

Already a member? Log in

Need Expert Statistics Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement