πŸŽ‰ 75% of content is free forever β€” Unlock Premium from $10/mo β†’
CW
Search courses…
πŸ’Ό Servicesℹ️ Aboutβœ‰οΈ ContactView Pricing Plansfrom $10

Cox Proportional Hazards Model

StatisticsSurvival Analysis🟒 Free Lesson

Advertisement

Cox Proportional Hazards Model

Statistics

Semi-Parametric Regression for Survival Data

The Cox model relates covariates to the hazard function without specifying the baseline hazard. Hazard ratios quantify how each predictor multiplies the risk of the event occurring at any given time.

  • Oncology β€” Identify prognostic factors for cancer survival

  • Reliability β€” Determine which operational conditions accelerate equipment failure

  • Employee Analytics β€” Predict turnover risk from workplace factors

The hazard ratio tells you how much faster or slower the clock ticks for each group.


The Cox model is a semi-parametric regression model for survival data that relates covariates to the hazard function without specifying the baseline hazard.

DfHazard Function

The instantaneous rate of event occurrence at time tt, given survival up to time tt:

Hazard Function

h(t)=lim⁑Δtβ†’0P(t≀T<t+Ξ”t∣Tβ‰₯t)Ξ”th(t) = \lim_{\Delta t \to 0}\frac{P(t \leq T < t + \Delta t \mid T \geq t)}{\Delta t}

Here,

  • h(t)h(t)=Hazard (force of mortality) at time t
  • TT=Time of event

Cox Model Specification

Cox Proportional Hazards

h(t∣X)=h0(t)exp⁑(Ξ²1X1+Ξ²2X2+β‹―+Ξ²pXp)h(t|X) = h_0(t) \exp(\beta_1 X_1 + \beta_2 X_2 + \cdots + \beta_p X_p)

Here,

  • h0(t)h_0(t)=Baseline hazard function (unspecified)
  • XiX_i=Covariate i
  • Ξ²i\beta_i=Regression coefficient for covariate i
  • exp⁑(Ξ²i)\exp(\beta_i)=Hazard ratio for covariate i

Hazard Ratios

Hazard Ratio

HR=h(t∣Xi=x1+1)h(t∣Xi=x1)=exp⁑(βi)HR = \frac{h(t|X_i = x_1 + 1)}{h(t|X_i = x_1)} = \exp(\beta_i)

Here,

  • HR>1HR > 1=Increased risk (covariate is harmful)
  • HR=1HR = 1=No effect
  • HR<1HR < 1=Decreased risk (covariate is protective)

Interpreting Hazard Ratios

A hazard ratio of 1.5 means the event is 50% more likely at any given time point. A HR of 0.7 means a 30% reduction in risk. Confidence intervals that include 1 indicate non-significance.


Proportional Hazards Assumption

The model assumes the ratio of hazards between any two individuals is constant over time.

PH Assumption

The key assumption is:

h1(t)h2(t)=exp⁑(Ξ²(X1βˆ’X2))=constant\frac{h_1(t)}{h_2(t)} = \exp(\beta(X_1 - X_2)) = \text{constant}

This means the effect of covariates does not change over time. If violated, consider time-varying coefficients or stratification.

Testing PH Assumption

  1. Schoenfeld residuals: Plot residuals against time; should show no pattern

  2. Log-log survival plots: Parallel curves indicate PH holds

  3. Statistical test: Test correlation of Schoenfeld residuals with time


Partial Likelihood

Cox's key insight: the baseline hazard h0(t)h_0(t) drops out of the likelihood.

Partial Likelihood

L(Ξ²)=∏i:Ξ΄i=1exp⁑(Ξ²TXi)βˆ‘j∈R(ti)exp⁑(Ξ²TXj)L(\beta) = \prod_{i: \delta_i=1}\frac{\exp(\beta^T X_i)}{\sum_{j \in R(t_i)}\exp(\beta^T X_j)}

Here,

  • Ξ΄i\delta_i=Event indicator (1=event, 0=censored)
  • R(ti)R(t_i)=Risk set at time $t_i$: all individuals still at risk
  • XiX_i=Covariate vector for individual i

Confidence Intervals for HR

CI for Hazard Ratio

CI=exp⁑(Ξ²^jΒ±1.96Γ—SE(Ξ²^j))\text{CI} = \exp\left(\hat{\beta}_j \pm 1.96 \times \text{SE}(\hat{\beta}_j)\right)

Here,

  • Ξ²^j\hat{\beta}_j=Estimated coefficient
  • SESE=Standard error from the variance-covariance matrix

Python Implementation


import numpy as np

import pandas as pd

from lifelines import CoxPHFitter

import matplotlib.pyplot as plt



np.random.seed(42)



# Simulate survival data with covariates

n = 300

age = np.random.normal(60, 10, n)

treatment = np.random.binomial(1, 0.5, n)

beta_true = [0.03, -0.5]  # age increases risk, treatment decreases risk



# Generate survival times

U = np.random.uniform(0, 1, n)

linpred = beta_true[0]*age + beta_true[1]*treatment

time = -np.log(U) / np.exp(linpred) * 100

censored_time = np.random.uniform(50, 150, n)

event = (time <= censored_time).astype(int)

observed_time = np.minimum(time, censored_time)



# Create DataFrame

df = pd.DataFrame({

    'duration': observed_time,

    'event': event,

    'age': age,

    'treatment': treatment

})



# Fit Cox model

cph = CoxPHFitter()

cph.fit(df, duration_col='duration', event_col='event')

print(cph.summary[['coef', 'exp(coef)', 'se(coef)', 'p', 'exp(coef) lower 95%', 'exp(coef) upper 95%']])



# Plot hazard ratios

cph.plot()

plt.title('Cox Model - Hazard Ratios')

plt.show()



# Concordance index

print(f"\nConcordance index: {cph.concordance_index_:.3f}")

Worked Example

Example: Cancer Treatment Study

A study of 500 patients examines the effect of age and a new drug on survival:

| Covariate | ß^ | HR | 95% CI | p-value |

|-----------|-----|-----|--------|---------|

| Age (per year) | 0.04 | 1.041 | 1.02–1.06 | 0.0001 |

| Drug (vs placebo) | -0.62 | 0.538 | 0.39–0.74 | 0.0001 |

  • Age: Each additional year increases hazard by 4.1%

  • Drug: Treatment reduces hazard by 46.2% (HR = 0.538)

  • Concordance index: 0.72 (good discrimination)

  • Schoenfeld test: p = 0.35 -> PH assumption holds

Forest Plot β€” Hazard Ratios

Model Evaluation

| Metric | Description |

|--------|------------|

| Concordance index | Probability that a randomly chosen event occurs at a shorter time for higher-risk individual (0.5 = random, 1.0 = perfect) |

| Partial AIC | For model comparison (lower is better) |

| Schoenfeld residuals | Check PH assumption |

| Likelihood ratio test | Test overall model significance |


Key Takeaways

Summary: Cox Proportional Hazards

  • The Cox model relates covariates to hazard without specifying the baseline hazard

  • Hazard ratio HR=exp⁑(Ξ²)HR = \exp(\beta): >1 harmful, <1 protective

  • The proportional hazards assumption requires constant hazard ratios over time

  • Use Schoenfeld residuals and log-log plots to check PH assumption

  • The model uses partial likelihood, which is free from the baseline hazard

  • Concordance index measures predictive discrimination (>0.7 is good)

  • Extensions include time-varying covariates and stratification


Related Topics

⭐

Premium Content

Cox Proportional Hazards Model

Unlock this lesson and 900+ advanced tutorials with a Premium plan.

🎯End-to-end Projects
πŸ’ΌInterview Prep
πŸ“œCertificates
🀝Community Access

Already a member? Log in

Need Expert Statistics Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement