🎉 75% of content is free forever — Unlock Premium from $10/mo →
CW
Search courses…
💼 Servicesℹ️ About✉️ ContactView Pricing Plansfrom $10

Kaplan-Meier Estimator — Survival Function

StatisticsSurvival Analysis🟢 Free Lesson

Advertisement

Kaplan-Meier Estimator — Survival Function

Statistics

Non-Parametric Estimation of Survival Probabilities

The Kaplan-Meier estimator constructs the survival function step-by-step at each event time, handling censored observations correctly. It produces the iconic survival curve used throughout medical and reliability research.

  • Clinical Trials — Estimate patient survival probabilities with varying follow-up times

  • Manufacturing — Predict component reliability with incomplete failure data

  • Customer Analytics — Model subscription duration with right-censored observations

Each step down in the survival curve represents real events, properly weighted for those still at risk.


The Kaplan-Meier estimator is a non-parametric method for estimating the survival function from time-to-event data, even when observations are censored.

DfSurvival Function

The probability that an event has not yet occurred by time tt:

Survival Function

S(t)=P(T>t)=1F(t)S(t) = P(T > t) = 1 - F(t)

Here,

  • TT=Time until event occurs
  • S(t)S(t)=Probability of surviving past time t
  • F(t)F(t)=Cumulative distribution function

Censoring

DfRight Censoring

An observation is right-censored if the event has not occurred by the end of the study period. We know the survival time is at least as long as the observed time.

| Type | Description |

|------|------------|

| Right-censored | Event not observed before study ends |

| Left-censored | Event occurred before study began |

| Interval-censored | Event known to occur in an interval |

Why Kaplan-Meier Matters

Standard methods (mean, median) cannot handle censored data. Kaplan-Meier correctly uses all available information, including the partial information from censored observations.


Kaplan-Meier Formula

Kaplan-Meier Estimator

S^(t)=tit(1dini)\hat{S}(t) = \prod_{t_i \leq t}\left(1 - \frac{d_i}{n_i}\right)

Here,

  • tit_i=Time of the i-th event
  • did_i=Number of events at time $t_i$
  • nin_i=Number at risk just before time $t_i$

The estimator is a step function that drops at each event time.


Standard Error

Greenwood's Formula

Var^[S^(t)]=S^(t)2titdini(nidi)\widehat{\text{Var}}[\hat{S}(t)] = \hat{S}(t)^2 \sum_{t_i \leq t}\frac{d_i}{n_i(n_i - d_i)}

Here,

  • did_i=Events at time $t_i$
  • nin_i=Number at risk at time $t_i$

The 95% confidence interval is:

Confidence Interval

S^(t)±1.96×SE^[S^(t)]\hat{S}(t) \pm 1.96 \times \widehat{\text{SE}}[\hat{S}(t)]

Here,

  • SE^\widehat{\text{SE}}=Estimated standard error from Greenwood's formula

Log-Rank Test

The log-rank test compares survival curves between two or more groups.

Log-Rank Test Statistic

χ2=(i(O1iE1i))2iVi\chi^2 = \frac{\left(\sum_{i}(O_{1i} - E_{1i})\right)^2}{\sum_{i}V_i}

Here,

  • O1iO_{1i}=Observed events in group 1 at time $t_i$
  • E1iE_{1i}=Expected events in group 1 under $H_0$
  • ViV_i=Variance of $(O_{1i} - E_{1i})$

| Hypothesis | Meaning |

|-----------|---------|

| H0H_0: S1(t)=S2(t)S_1(t) = S_2(t) | No difference in survival between groups |

| H1H_1: S1(t)S2(t)S_1(t) \neq S_2(t) | Survival curves differ |


Median Survival Time

The median survival is the smallest time tt at which S(t)0.5S(t) \leq 0.5.

Median Survival

t^med=inf{t:S^(t)0.5}\hat{t}_{med} = \inf\{t : \hat{S}(t) \leq 0.5\}

Here,

  • S^(t)\hat{S}(t)=Kaplan-Meier estimate of survival

When Median is Undefined

If the survival curve never drops below 0.5 (e.g., more than half survive), the median survival time is undefined. Report as "not reached" (NR).


Python Implementation


import numpy as np

import pandas as pd

from lifelines import KaplanMeierFitter

from lifelines.statistics import logrank_test

import matplotlib.pyplot as plt



np.random.seed(42)



# Simulate survival data

n = 200

treatment = np.random.binomial(1, 0.5, n)

time = np.where(treatment,

                np.random.exponential(12, n),  # Treatment: longer survival

                np.random.exponential(8, n))    # Control: shorter survival

censored = np.random.binomial(1, 0.2, n)       # 20% censoring

event = 1 - censored



# Kaplan-Meier curves

kmf_treat = KaplanMeierFitter()

kmf_control = KaplanMeierFitter()



mask_treat = treatment == 1

kmf_treat.fit(time[mask_treat], event[mask_treat], label='Treatment')

kmf_control.fit(time[~mask_treat], event[~mask_treat], label='Control')



# Plot

fig, ax = plt.subplots(figsize=(8, 5))

kmf_treat.plot_survival_function(ax=ax)

kmf_control.plot_survival_function(ax=ax)

ax.set_title('Kaplan-Meier Survival Curves')

ax.set_xlabel('Time')

ax.set_ylabel('Survival Probability')

plt.show()



# Median survival

print(f"Treatment median: {kmf_treat.median_survival_time_:.1f}")

print(f"Control median: {kmf_control.median_survival_time_:.1f}")



# Log-rank test

result = logrank_test(time[mask_treat], time[~mask_treat],

                      event_observed_A=event[mask_treat],

                      event_observed_B=event[~mask_treat])

print(f"\nLog-rank test: ?²={result.test_statistic:.2f}, p={result.p_value:.4f}")

Worked Example

Example: Drug Trial

Comparing survival times between treatment and control groups:

| Time | At Risk (Control) | Events | At Risk (Treatment) | Events |

|------|-------------------|--------|---------------------|--------|

| 3 | 100 | 5 | 100 | 2 |

| 6 | 94 | 8 | 97 | 3 |

| 9 | 85 | 6 | 93 | 4 |

| 12 | 78 | 4 | 88 | 3 |

Control S^(6)=(15/100)(18/94)=0.95×0.915=0.869\hat{S}(6) = (1 - 5/100)(1 - 8/94) = 0.95 \times 0.915 = 0.869

Treatment S^(6)=(12/100)(13/97)=0.98×0.969=0.950\hat{S}(6) = (1 - 2/100)(1 - 3/97) = 0.98 \times 0.969 = 0.950

Log-rank test: χ2=6.82\chi^2 = 6.82, p = 0.009 -> Treatment has significantly better survival.

Kaplan-Meier Survival Curves — Drug Trial

Key Takeaways

Summary: Kaplan-Meier Estimator

  • Kaplan-Meier estimates the survival function from censored data

  • The estimator is a step function that drops at each observed event time

  • Greenwood's formula provides the standard error for confidence intervals

  • The log-rank test compares survival curves between groups

  • Median survival is the time when S(t)=0.5S(t) = 0.5; may be undefined

  • The method makes the independent censoring assumption

  • Always report confidence intervals alongside point estimates


Related Topics

Premium Content

Kaplan-Meier Estimator — Survival Function

Unlock this lesson and 900+ advanced tutorials with a Premium plan.

🎯End-to-end Projects
💼Interview Prep
📜Certificates
🤝Community Access

Already a member? Log in

Need Expert Statistics Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement