Cox Proportional Hazards Model
Statistics
Semi-Parametric Regression for Survival Data
The Cox model relates covariates to the hazard function without specifying the baseline hazard. Hazard ratios quantify how each predictor multiplies the risk of the event occurring at any given time.
-
Oncology β Identify prognostic factors for cancer survival
-
Reliability β Determine which operational conditions accelerate equipment failure
-
Employee Analytics β Predict turnover risk from workplace factors
The hazard ratio tells you how much faster or slower the clock ticks for each group.
The Cox model is a semi-parametric regression model for survival data that relates covariates to the hazard function without specifying the baseline hazard.
DfHazard Function
The instantaneous rate of event occurrence at time , given survival up to time :
Hazard Function
Here,
- =Hazard (force of mortality) at time t
- =Time of event
Cox Model Specification
Cox Proportional Hazards
Here,
- =Baseline hazard function (unspecified)
- =Covariate i
- =Regression coefficient for covariate i
- =Hazard ratio for covariate i
Hazard Ratios
Hazard Ratio
Here,
- =Increased risk (covariate is harmful)
- =No effect
- =Decreased risk (covariate is protective)
Interpreting Hazard Ratios
A hazard ratio of 1.5 means the event is 50% more likely at any given time point. A HR of 0.7 means a 30% reduction in risk. Confidence intervals that include 1 indicate non-significance.
Proportional Hazards Assumption
The model assumes the ratio of hazards between any two individuals is constant over time.
PH Assumption
The key assumption is:
This means the effect of covariates does not change over time. If violated, consider time-varying coefficients or stratification.
Testing PH Assumption
-
Schoenfeld residuals: Plot residuals against time; should show no pattern
-
Log-log survival plots: Parallel curves indicate PH holds
-
Statistical test: Test correlation of Schoenfeld residuals with time
Partial Likelihood
Cox's key insight: the baseline hazard drops out of the likelihood.
Partial Likelihood
Here,
- =Event indicator (1=event, 0=censored)
- =Risk set at time $t_i$: all individuals still at risk
- =Covariate vector for individual i
Confidence Intervals for HR
CI for Hazard Ratio
Here,
- =Estimated coefficient
- =Standard error from the variance-covariance matrix
Python Implementation
import numpy as np
import pandas as pd
from lifelines import CoxPHFitter
import matplotlib.pyplot as plt
np.random.seed(42)
# Simulate survival data with covariates
n = 300
age = np.random.normal(60, 10, n)
treatment = np.random.binomial(1, 0.5, n)
beta_true = [0.03, -0.5] # age increases risk, treatment decreases risk
# Generate survival times
U = np.random.uniform(0, 1, n)
linpred = beta_true[0]*age + beta_true[1]*treatment
time = -np.log(U) / np.exp(linpred) * 100
censored_time = np.random.uniform(50, 150, n)
event = (time <= censored_time).astype(int)
observed_time = np.minimum(time, censored_time)
# Create DataFrame
df = pd.DataFrame({
'duration': observed_time,
'event': event,
'age': age,
'treatment': treatment
})
# Fit Cox model
cph = CoxPHFitter()
cph.fit(df, duration_col='duration', event_col='event')
print(cph.summary[['coef', 'exp(coef)', 'se(coef)', 'p', 'exp(coef) lower 95%', 'exp(coef) upper 95%']])
# Plot hazard ratios
cph.plot()
plt.title('Cox Model - Hazard Ratios')
plt.show()
# Concordance index
print(f"\nConcordance index: {cph.concordance_index_:.3f}")
Worked Example
Example: Cancer Treatment Study
A study of 500 patients examines the effect of age and a new drug on survival:
| Covariate | Γ^ | HR | 95% CI | p-value |
|-----------|-----|-----|--------|---------|
| Age (per year) | 0.04 | 1.041 | 1.02β1.06 | 0.0001 |
| Drug (vs placebo) | -0.62 | 0.538 | 0.39β0.74 | 0.0001 |
-
Age: Each additional year increases hazard by 4.1%
-
Drug: Treatment reduces hazard by 46.2% (HR = 0.538)
-
Concordance index: 0.72 (good discrimination)
-
Schoenfeld test: p = 0.35 -> PH assumption holds
Model Evaluation
| Metric | Description |
|--------|------------|
| Concordance index | Probability that a randomly chosen event occurs at a shorter time for higher-risk individual (0.5 = random, 1.0 = perfect) |
| Partial AIC | For model comparison (lower is better) |
| Schoenfeld residuals | Check PH assumption |
| Likelihood ratio test | Test overall model significance |
Key Takeaways
Summary: Cox Proportional Hazards
-
The Cox model relates covariates to hazard without specifying the baseline hazard
-
Hazard ratio : >1 harmful, <1 protective
-
The proportional hazards assumption requires constant hazard ratios over time
-
Use Schoenfeld residuals and log-log plots to check PH assumption
-
The model uses partial likelihood, which is free from the baseline hazard
-
Concordance index measures predictive discrimination (>0.7 is good)
-
Extensions include time-varying covariates and stratification
Related Topics
-
See Kaplan-Meier Estimator for non-parametric survival estimation
-
See Logistic Regression for binary outcomes
-
See Mediation Analysis for causal pathway analysis