🎉 75% of content is free forever — Unlock Premium from $10/mo →
CW
Search courses…
💼 Servicesℹ️ About✉️ ContactView Pricing Plansfrom $10

Robust Statistics — Resistant to Outliers

Advanced Statistical MethodsRobust Methods🟢 Free Lesson

Advertisement

Robust Statistics — Resistant to Outliers

Advanced Statistical Methods

When Outliers Try to Ruin Your Analysis

Robust statistics provide methods that resist the influence of extreme observations, ensuring reliable inference even when data are contaminated or assumptions are violated. A single outlier can distort classical estimates by orders of magnitude.

  • Financial risk management — Robust estimators prevent extreme market events from skewing risk models
  • Quality control — Manufacturing data often contain contamination; robust methods maintain accuracy
  • Environmental monitoring — Sensor malfunctions produce outliers that robust techniques gracefully handle

Robust statistics keep your conclusions standing even when the data fight back.


Why Robustness Matters

Classical estimators such as the sample mean and OLS regression are optimal under normality but highly sensitive to outliers. A single extreme observation can drastically alter results. Robust statistics provides estimators that remain reliable even when data are contaminated or model assumptions are violated.


Robust Estimators of Location

DfSample Median as a Robust Estimator

The sample median is the most fundamental robust estimator of location. It minimizes the sum of absolute deviations:

μ^med=argminμi=1nxiμ\hat{\mu}_{\text{med}} = \underset{\mu}{\arg\min} \sum_{i=1}^{n} |x_i - \mu|

The median has a 50% breakdown point: it takes corruption of at least half the data to render it arbitrarily wrong.

Trimmed Mean

xˉα=1n2nαi=nα+1nnαx(i)\bar{x}_{\alpha} = \frac{1}{n - 2\lfloor n\alpha \rfloor} \sum_{i=\lfloor n\alpha \rfloor + 1}^{n - \lfloor n\alpha \rfloor} x_{(i)}

Here,

  • α\alpha=Trimming fraction (e.g., 0.1 for 10% trimming)
  • x(i)x_{(i)}=The i-th order statistic
  • nα\lfloor n\alpha \rfloor=Number of observations trimmed from each tail

Winsorized Mean

xˉW=1ni=1nx~i,x~i={x(nα+1)if xi<x(nα+1)xiotherwisex(nnα)if xi>x(nnα)\bar{x}_W = \frac{1}{n} \sum_{i=1}^{n} \tilde{x}_i, \quad \tilde{x}_i = \begin{cases} x_{(\lfloor n\alpha \rfloor + 1)} & \text{if } x_i < x_{(\lfloor n\alpha \rfloor + 1)} \\ x_i & \text{otherwise} \\ x_{(n - \lfloor n\alpha \rfloor)} & \text{if } x_i > x_{(n - \lfloor n\alpha \rfloor)} \end{cases}

Here,

  • x~i\tilde{x}_i=Winsorized observation — extremes replaced by nearest non-extreme value
  • α\alpha=Winsorizing fraction

Breakdown Point

DfBreakdown Point

The finite-sample breakdown point of an estimator θ^n\hat{\theta}_n is the smallest fraction ϵ\epsilon^* of observations that can be replaced by arbitrary values to make the estimator arbitrarily large:

ϵ=min{mn:supcorruptionθ^n,m=}\epsilon^* = \min\left\{\frac{m}{n} : \sup_{\text{corruption}} |\hat{\theta}_{n,m}| = \infty\right\}

The sample mean has breakdown point ϵ=1/n\epsilon^* = 1/n (one outlier suffices). The median achieves ϵ=0.5\epsilon^* = 0.5 (the maximum possible).

Efficiency vs. Breakdown

There is a fundamental tradeoff: estimators with higher breakdown points tend to have lower efficiency under normality. The median is 64% efficient under normality compared to the mean, but far more robust.


M-Estimators

DfM-Estimator

An M-estimator generalizes maximum likelihood by solving:

i=1nψ(xiθ^)=0\sum_{i=1}^{n} \psi(x_i - \hat{\theta}) = 0

where ψ\psi is a function derived from a loss function ρ\rho via ψ(u)=ρ(u)\psi(u) = \rho'(u). For least squares, ρ(u)=u2\rho(u) = u^2 and ψ(u)=2u\psi(u) = 2u, yielding the mean. Robust M-estimators use ψ\psi functions that bound the influence of outliers.

M-Estimator Objective Function

θ^=argminθi=1nρ(xiθσ^)\hat{\theta} = \underset{\theta}{\arg\min} \sum_{i=1}^{n} \rho\left(\frac{x_i - \theta}{\hat{\sigma}}\right)

Here,

  • ρ\rho=Robust loss function (e.g., Huber or Tukey bisquare)
  • σ^\hat{\sigma}=Robust scale estimate (e.g., MAD)
  • ψ\psi=Derivative of \rho: the influence function

Huber's ψ\psi Function

Huber's Loss Function

ρH(u)={12u2if ukku12k2if u>k\rho_H(u) = \begin{cases} \frac{1}{2}u^2 & \text{if } |u| \leq k \\ k|u| - \frac{1}{2}k^2 & \text{if } |u| > k \end{cases}

Here,

  • kk=Tuning constant, typically k = 1.345 for 95% efficiency under normality
  • ψH(u)\psi_H(u)== \min(|u|, k) \cdot \text{sign}(u): clips influence at |u| = k

Tukey's Bisquare (Biweight) ψ\psi Function

Tukey Bisquare Loss

ρT(u)={k26[1(1u2k2)3]if ukk26if u>k\rho_T(u) = \begin{cases} \frac{k^2}{6}\left[1 - \left(1 - \frac{u^2}{k^2}\right)^3\right] & \text{if } |u| \leq k \\ \frac{k^2}{6} & \text{if } |u| > k \end{cases}

Here,

  • kk=Tuning constant, typically k = 4.685 for 95% efficiency under normality
  • ψT(u)\psi_T(u)== u(1 - u^2/k^2)^2 \cdot \mathbf{1}(|u| \leq k): redescending — fully rejects extreme outliers

Huber vs. Tukey Bisquare

  • Huber: ψ\psi is bounded but does not redescend — extreme outliers still have some (bounded) influence
  • Tukey bisquare: ψ\psi redescends to 0 — outliers beyond u>k|u| > k have zero influence entirely
  • Use Huber when you want bounded influence; use Tukey when you want complete rejection of extreme outliers

Influence Function

ThInfluence Function Properties

The influence function (IF) of an estimator TT at distribution FF is:

IF(x;T,F)=limϵ0T((1ϵ)F+ϵδx)T(F)ϵ\text{IF}(x; T, F) = \lim_{\epsilon \to 0} \frac{T((1-\epsilon)F + \epsilon \delta_x) - T(F)}{\epsilon}

where δx\delta_x is a point mass at xx. This measures the infinitesimal effect of adding an outlier at xx to the distribution FF.

Properties:

  • An estimator is bounded-influence if IF\text{IF} is bounded for all xx
  • The OLS estimator has IF(x;β^,F)x\text{IF}(x; \hat{\beta}, F) \propto x — unbounded
  • The Huber M-estimator has IF(x;TH,F)=ψH(x)σ\text{IF}(x; T_H, F) = \psi_H(x) \cdot \sigma — bounded by kσk\sigma
  • The Tukey bisquare has redescending IF — returns to 0 for large x|x|

Robust Regression

Robust Regression with statsmodels

import numpy as np
import statsmodels.api as sm
from statsmodels.robust import huber
import matplotlib.pyplot as plt

np.random.seed(42)
n = 100

# Clean data
X_clean = np.linspace(0, 10, n)
y_clean = 2 + 1.5 * X_clean + np.random.normal(0, 1, n)

# Add 10% gross outliers
n_outliers = 10
outlier_idx = np.random.choice(n, n_outliers, replace=False)
y_contaminated = y_clean.copy()
y_contaminated[outlier_idx] += np.random.normal(0, 20, n_outliers)

X = sm.add_constant(X_clean)

# OLS — sensitive to outliers
ols = sm.OLS(y_contaminated, X).fit()

# Robust regression — Huber's T
rlm = sm.RLM(y_contaminated, X, M=sm.robust.norms.HuberT()).fit()

# Robust regression — Tukey Bisquare
rlm_bisquare = sm.RLM(y_contaminated, X, M=sm.robust.norms.TukeyBiweight()).fit()

print("OLS estimates:", np.round(ols.params, 4))
print("Huber estimates:", np.round(rlm.params, 4))
print("Tukey estimates:", np.round(rlm_bisquare.params, 4))
print("\nTrue: [2.0, 1.5]")

# Visualization
fig, ax = plt.subplots(figsize=(10, 6))
ax.scatter(X_clean, y_contaminated, s=20, alpha=0.6, label='Data (with outliers)')
ax.scatter(X_clean[outlier_idx], y_contaminated[outlier_idx],
           s=80, c='red', marker='x', label='Outliers')
ax.plot(X_clean, ols.fittedvalues, 'b-', linewidth=2, label='OLS')
ax.plot(X_clean, rlm.fittedvalues, 'g--', linewidth=2, label='Huber')
ax.plot(X_clean, rlm_bisquare.fittedvalues, 'r:', linewidth=2, label='Tukey Bisquare')
ax.legend()
ax.set_title('Robust vs. OLS Regression')
plt.tight_layout()
plt.savefig('robust_regression.png', dpi=150)
plt.show()

Robust Standard Errors

Huber-White Robust Standard Errors

Var^(β^)=(XTX)1(i=1nu^i2xixiT)(XTX)1\widehat{\text{Var}}(\hat{\beta}) = (X^T X)^{-1} \left(\sum_{i=1}^{n} \hat{u}_i^2 \, x_i x_i^T \right) (X^T X)^{-1}

Here,

  • u^i\hat{u}_i=OLS residual for observation i
  • xix_i=Row vector of regressors for observation i

When to Use Robust SEs

Robust standard errors (also called sandwich estimators or HC estimators) are valid under heteroskedasticity and mild misspecification. They do not require the error variance to be constant. Use them when:

  • You suspect heteroskedasticity
  • You want protection against mild misspecification of the error distribution
  • You are running OLS but want inference that is robust to non-normal errors

Bootstrap for Robust Inference

DfNonparametric Bootstrap

The nonparametric bootstrap resamples the observed data directly (with replacement) to estimate the sampling distribution of any statistic, without distributional assumptions:

θ^b=T(Xb),b=1,,B\hat{\theta}^{*b} = T(X^{*b}), \quad b = 1, \ldots, B

where XbX^{*b} is the bb-th bootstrap sample drawn from the empirical distribution F^n\hat{F}_n.

Bootstrap Standard Errors for the Median

import numpy as np

np.random.seed(42)
data = np.array([3.2, 4.1, 2.8, 15.7, 3.5, 4.0, 3.9, 2.1,
                 4.3, 3.7, 16.2, 3.4, 4.2, 2.9, 3.8])

# Bootstrap standard error of the median
B = 10000
boot_medians = np.array([
    np.median(np.random.choice(data, size=len(data), replace=True))
    for _ in range(B)
])

se_median = np.std(boot_medians, ddof=1)
ci_95 = np.percentile(boot_medians, [2.5, 97.5])

print(f"Sample median: {np.median(data):.2f}")
print(f"Bootstrap SE:  {se_median:.4f}")
print(f"95% Bootstrap CI: [{ci_95[0]:.2f}, {ci_95[1]:.2f}]")

Key Takeaways

Summary: Robust Statistics

  • Classical estimators (mean, OLS) are non-robust — a single outlier can distort results arbitrarily
  • Breakdown point quantifies an estimator's resistance to contamination; the median achieves the maximum (50%)
  • M-estimators generalize MLE by bounding the influence of outliers via ψ\psi functions
  • Huber's ψ\psi clips influence; Tukey's bisquare redescends to zero — complete rejection
  • Influence function formalizes the effect of an infinitesimal outlier on an estimator
  • Robust regression (RLM) provides coefficient estimates that are not driven by outliers
  • Robust standard errors (sandwich estimators) protect against heteroskedasticity without changing point estimates
  • Bootstrap provides distribution-free inference for any statistic, including robust estimators

Premium Content

Robust Statistics — Resistant to Outliers

Unlock this lesson and 900+ advanced tutorials with a Premium plan.

🎯End-to-end Projects
💼Interview Prep
📜Certificates
🤝Community Access

Already a member? Log in

Need Expert Statistics Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement