Applied Survival Analysis

Chapter 10 - Competing/Semi-Competing Risks

Author
Affiliations

Lu Mao

Department of Biostatistics & Medical Informatics

University of Wisconsin-Madison

Outline

  1. Cause-specific hazard and cumulative incidence

  2. Non- and semi-parametric methods

  3. Analysis of bone marrow transplantation study

  4. Semi-competing risks and examples

\[\newcommand{\d}{{\rm d}}\] \[\newcommand{\T}{{\rm T}}\] \[\newcommand{\dd}{{\rm d}}\] \[\newcommand{\cc}{{\rm c}}\] \[\newcommand{\pr}{{\rm pr}}\] \[\newcommand{\var}{{\rm var}}\] \[\newcommand{\se}{{\rm se}}\] \[\newcommand{\indep}{\perp \!\!\! \perp}\] \[\newcommand{\Pn}{n^{-1}\sum_{i=1}^n}\]

Basic Quantities

Definition and Examples

  • Competing risks
    • Definition: multiple types of failures, occurrence of one precludes others
      • Multiple latent risks competing with each other for first occurrence
    • Examples: different causes of death
  • Defining feature
    • One failure per subject
    • Less info than multivariate failure times
    • Implications for analysis and interpretation

Outcome Data & Identifiability

  • Target of inference: \((T, \Delta)\)
    • \(T\): time to failure
    • \(\Delta \in \{1, \ldots, K\}\): type/cause of failure (categorical)
  • Multivariate perspective
    • A conceptual framework \[\begin{equation}\label{eq:cmpr:mult} T=\min(T_1,\ldots, T_K) \hspace{1mm}\mbox{ and } \hspace{1mm}\Delta=\arg\min_{k=1,\ldots, K} T_k \end{equation}\]
      • \(T_k\): time to \(k\)th latent risk \((k=1,\ldots, K)\) in absence of other risks
      • Competing risks \(=\) partially observed multivariate failure times
    • Identifiability: net distribution of latent \(T_k\) not recognizable from \((T, \Delta)\)
      • unless under unrealistic assumption of mutual independence of the \(T_k\)

Two Approaches

  • Two ways to characterize \((T, \Delta)\)
    • Cause-specific hazard
    • Cumulative incidence (Sub-distribution)
  • Cause-specific hazard (CSH) \[\begin{equation}\label{eq:cmpr:cs_hazard} \dd\Lambda_k^\cc(t)=\pr(t\leq T<t+\dd t, \Delta=k\mid T\geq t) \end{equation}\]
    • Interpretation: instantaneous incidence of \(k\)th risk given overall “survival”
    • Example: \(\dd\Lambda_1^\cc(t)=\) incidence rate for CV death among survivors at \(t\)
      • \(k =1\): CV death; \(k =2\): other causes of death
    • Overall hazard: \(\dd\Lambda(t)=\pr(t\leq T<t+\dd t \mid T\geq t) = \sum_{k=1}^K \dd\Lambda_k^\cc(t)\)

Cause-Specific Hazard

  • Limitations
    • The cumulative CSH \(\Lambda_k^\cc(t)\) not a meaningful quantity
    • \(\exp\{-\Lambda_k^\cc(t)\}\) not a survival function of any kind
  • Correspondence with “net hazard”
    • Under (unrealistic) mutual independence of the latent \(T_k\)
    • Hazard of \(T_k\) identifiable and equal to \(\Lambda_k^\cc(t)\)
    • Not recommended

Cumulative Incidence

  • Cumulative incidence function (CIF) \[ F_k(t)=\pr(T\leq t, \Delta=k) \]
    • Interpretation: probability of failure from \(k\)th risk in presence of other risks
    • Marginal quantity, easy to interpret (a real-world probability)
    • \(F(t)=\pr(T\leq t) = \sum_{k=1}^K F_k(t)\) (sub-distribution functions)
  • Relationship
    • CSH in terms of CIF (for a particular risk, no one-to-one correspondence) \[\begin{equation}\label{eq:cmpr:correspond} \dd\Lambda_k^\cc(t)=\frac{\dd F_k(t)}{1-\sum_{l=1}^K F_l(t-)} \end{equation}\]

CSH vs CIF

  • Example: effect on CSH may not align with CIF
    • Treatment: \(\Lambda_1^\cc(t)=\Lambda_2^\cc(t)=3t\)
    • Control: \(\Lambda_1^\cc(t)=2t\) and \(\Lambda_2^\cc(t)=t\)

Comparison of Functions

  • CSH vs net hazard vs CIF

Methods for Competing Risks

Observed Data

  • A random \(n\)-sample \[(X_i, \delta_i, Z_i), \,\,\, i=1,\ldots, n\]
    • \(X=T\wedge C\)
    • \(\delta=\Delta I(T\leq C)\) (0: censored; \(k\): observed \(k\)th risk, \(k=1,\ldots, K\))
    • \(C\): independent censoring time
    • \(Z\): covariates
    • \(N_{ki}(t)=I(X_i\leq t, \delta_i=k)\): Observed counting process for \(k\)th risk

Cause-Specific Hazard Models (I)

  • Proportional cause-specific hazards \[\begin{equation}\label{eq:cmpr:ph_csh} \pr(t\leq T<t+\dd t, \Delta=k\mid T\geq t, Z)=\exp(\beta_k^\T Z)\dd\Lambda_{k0}(t) \end{equation}\]
    • A model on “survivors” (those who have not failed from any risk)
    • \(\beta_k\): log-hazard ratios for \(k\)th risk in survivors
  • Estimation
    • Partial likelihood score for \(k\)th risk \[ U_{nk}(\beta_k)=\Pn\int_0^\infty\left\{Z_i-\frac{\sum_{j=1}^n I(X_j\geq t)Z_j\exp(\beta_k^\T Z_j)}{\sum_{j=1}^n I(X_j\geq t)\exp(\beta_k^\T Z_j)}\right\}\dd N_{ki}(t) \]
      • Essentially treating non-\(k\) risks as censoring
      • Easy to implement in survival::coxph() (status == k)

Cause-Specific Hazard Models (II)

  • Under (unrealistic) independence of the latent \(T_k\)
    • Proportional cause-specific hazards \(\Longleftrightarrow\) proportional net hazards
    • Frailty to relax independence
    • Not recommended
      • Non-identifiability
      • Non-interpretability
  • Log-rank test
    • Risk-specific log-rank statisic with other risks treated as censoring
    • “Kaplan-Meier” doesn’t work \(\to\) estimand \(\exp\{-\Lambda_k^\cc(t)\}\) is non-quantity
    • Competing-risks equivalent of KM is Gray (1988) estimator of \(F_k(t)\)

CIF: Nonparametric Estimation

  • Decomposition of \(F_k(t)\) \[\begin{equation*} \dd F_k(t)=S(t-)\dd\Lambda_k^\cc(t) \end{equation*}\]

    • \(S(t-)=\pr(T\geq t)\): By KM estimator \(\hat S(t-)\) for overall failure

    • \(\dd\Lambda_k^\cc(t)\): By Nelsen-Aalen estimator (non-\(k\) risks as censoring) \[ \dd\hat\Lambda_k^\cc(t)=\frac{\sum_{i=1}^n\dd N_{ki}(t)}{\sum_{i=1}^n I(X_i\geq t)} \]

  • Gray estimator \[\begin{equation*} \hat F_k(t)=\int_0^t \hat S(u-)\dd\hat \Lambda_k^\cc(u) \end{equation*}\]

CIF: Sub-Distribution Hazard

  • Sub-distribution hazard (SDH) \[\begin{equation}\label{eq:cmpr:sdh} \dd\Lambda_k(t)=\pr(t\leq T_k^*<t+\dd t\mid T_k^*\geq t) \end{equation}\]
    • Time to \(k\)th risk in presence of other risks \[ T_k^*=T\cdot I(\Delta=k)+\infty\cdot I(\Delta\neq k) \]
      • \(F_k(t)=\pr(T_k^*\leq t)\)
    • Interpretation: incidence of \(k\)th risk given not failing from it
    • Easier to work with hazard-like functions (unbounded)
    • Correspondence with CIF \[\begin{equation}\label{eq:cmpr:subdist} F_k(t)=1-\exp\{-\Lambda_k(t)\} \end{equation}\]
    • Model/test on SDH \(\Longleftrightarrow\) Model/test on CIF

CIF: Gray’s Test

  • Discrete version (\(k\)th risk) \[ \dd\Lambda_k(t)=\frac{\dd F_k(t)}{1-F_k(t-)} \]
  • Log-rank-type test
    • Testing \(H_0: F_{k1}(t)=F_{k0}(t)\) \[\begin{equation}\label{eq:cmpr:gray_tests} \int_0^\infty W_k(t)\big\{\dd\hat\Lambda_{k1}(t)-\dd\hat\Lambda_{k0}(t)\big\} \end{equation}\]
    • \(\hat\Lambda_{ka}(t)\): Gray’s CIF plug-in estimator for SDH in group \(a\) \((a=1, 0)\)
    • \(W_k(t)\): Weight function
    • Extensions: multi-group, stratification, etc.

CIF: Fine-Gray Regression

  • Proportional sub-distribution hazards (Fine and Gray, 1999) \[\begin{equation}\label{eq:cmpr:fg} \pr(t\leq T_k^*<t+\dd t\mid T_k^*\geq t, Z)=\exp(\beta_k^\T Z)\dd\Lambda_{k0}(t) \end{equation}\]
    • \(\beta_k\): log-hazard ratios for \(k\)th risk in entire population (under other risks)
    • Estimation and inference: partial-likelihood score with IPCW to address dependent censoring by other risks
  • Target populations
    • Cause-specific hazard: survivors
    • Sub-distribution hazard: all, including those who have failed from other causes

Software: cmprk::cuminc()

  • Basic syntax for Gray’s estimator & test
obj <- cuminc(ftime, fstatus, group, strata, rho = 0)
  • Input
    • (ftime, fstatus): \((X, \delta)\)
    • group: group variable (optional); strata: strata variable (optional)
    • rho: \(\rho\) in weight \(W_k(t)=\{1 - \hat F_k(t)\}^\rho\) (HF \(G^\rho\) family)
  • Output: a list
    • obj$Tests: tests results on each risk
    • obj$"a k": CIF estimates for \(k\)th risk in group \(a\)
      • time: \(t\); est: \(\hat F_k(t)\); var: \(\hat\var\{\hat F_k(t)\}\)

Software: cmprk::crr() (I)

  • Basic syntax for Fine-Gray model
obj <- crr(ftime, fstatus, cov1, failcode = k)
  • Input
    • (ftime, fstatus): \((X, \delta)\); cov1: \(Z\)
    • failcode = k: models \(k\)th risk
  • Output: a list of class crr
    • obj$coef: \(\hat\beta_k\); obj$var: \(\hat\var(\hat\beta_k)\)
    • obj$uftime: \(t\); obj$bfitj: \(\dd\hat\Lambda_{k0}(t)\)

Software: cmprk::crr() (II)

  • Prediction of CIF by Fine-Gray model
    • obj: a crr object for fit model
    • z: new covariate data
# --- Method 1: use predict.crr() 
obj_pred <- predict(obj, z)

# --- Method 2: manual calculation
beta <- obj$coef
Lambda <- cumsum(obj$bfitj)
time <- obj$uftime
## Calculate CIF based on FG model
cif <- 1- exp(- exp(sum(beta * z)) * Lambda)
## Same as obj_pred from Method 1
obj_pred <- cbind(time, cif)

Software: tidycmprk Package (I)

  • Tidy competing risks analysis
    • Wraps around cmprk functions
    • Formula-based interface \(+\) Tidy output
    • Integrated with ggsurvfit and gtsummary
  • Main functions
    • Nonparametric analysis
      • tidycmprsk::cuminc(): model fitting
      • tidycmprsk::tbl_cuminc(): tabulation of CIF estimates
      • ggsurvfit::ggcuminc(): plotting of CIF estimates
    • Fine-Gray regression
      • tidycmprsk::crr(): model fitting
      • gtsummary::tbl_regression(): regression table

Software: tidycmprk Package (II)

  • Sample code
# Compute and test on cumulative incidence function (CIF) -------------------
obj_cif <- tidycmprsk::cuminc(Surv(time, status) ~ group, data = df)
# Tabulate CIF estimates with 95% CI at specific times
tbl_surv <- tidycmprsk::tbl_cuminc(
    x = obj_cif,                     # Provide fitted object
    times = seq(12, 120, by = 36),   # Time points for CIF
    outcomes = c("1", "2"),          # Specify risks to include
) 
# Create group-specific CIF plot
ggcuminc(obj_cif, outcome = c("1", "2"))  # Specify risks to include

# Fine-Gray regression for risk "k" -----------------------------------------
obj_fg <- tidycmprsk::crr(Surv(time, status) ~ covariates, 
                           data = df, failcode = "k")
# Tabulate regression results
tbl <- tbl_regression(obj_fg, exponentiate = TRUE)

Bone Marrow Transplant Study

Study Information

  • Population
    • 864 multiple-myeloma patients undergoing allo-HCT cell transplantation
  • Endpoints: time from surgery to
    • Treatment-related mortality (TRM; death in remission)
    • Relapse of leukemia
  • Risk factors
    • Cohort indicator (years 1995–2000 or 2001–2005)
    • Type of donor (unrelated or HLA-identical sibling)
    • History of a prior transplant
    • Time from diagnosis to transplantation (<24 months, or ≥ 24 months)

Study Data

  • Data format
head(cibmtr)
#>    time status cohort donor hist wait
#> 1 0.010      2      1     1    1    1
#> 2 0.066      2      1     1    1    0
#> 3 0.099      1      1     1    1    1
#> ...

CIF by Donor Type (I)

  • Estimation/testing by donor type
obj_cif <- tidycmprsk::cuminc(Surv(time, status) ~ donor, data = df)
obj_cif # Print results
#> ...
#> • Tests
#> outcome   statistic   df     p.value    
#> Relapse   17.3        1.00   <0.001     
#> TRM       12.3        1.00   <0.001

# Summarize CIF estimates at specific times 
tbl_surv <- tbl_cuminc(
    x = obj_cif,                     # Provide the fitted object
    times = seq(12, 120, by = 36),   # Time points for CIF
    outcomes = c("Relapse", "TRM"),  # Specify risks to include
    label_header = "{time} months"   # Column label format
) 
tbl_surv

CIF by Donor Type (II)

  • Table created by tbl_cuminc()

CIF by Donor Type (III)

  • Graphics
obj_cif |> ggcuminc(outcome = c("Relapse", "TRM")) # + ggplot2 formatting

Regression Analysis

  • Semiparametric regression
    • Proportional cause-specific hazard
    • Proportional sub-distribution hazard (FG)
# Change k
k <- 1
#--- proportional cause-specific hazards -----------------------------
obj.cs <- coxph(Surv(time, status == k) ~ cohort + donor + hist + wait,
                data = cibmtr)

#--- Fine and Gray --------------------------------------------------
obj.fg <- crr(cibmtr$time, cibmtr$status, cibmtr[, 3:6], failcode = k)

Regression Results

  • Differential effects of prior surgery
    • Relapse: increases risk by \(42\%\)
    • TRM: decreases risk by \(1-0.63 = 37\%\)

Semi-Competing Risks

Definition and Examples

  • Semi-competing risks
    • Terminal event (competing): death
    • Non-terminal events (non-competing): hospitalization, relapse, etc.
  • Examples
    • Death + relapse of cancer (German breast cancer study)
    • Death terminates nonfatal event but not vice versa
  • Methods
    • Marginal: cumulative incidence/frequency
    • Frailty (Ch. 8); Multistate (Ch. 12); Composite (Ch. 13)

Data and Identifiability

  • Target of inference: \((T, D)\)
    • \(T\): time to nonfatal event
    • \(D\): time to death
    • Joint distribution \[H(s, t)=\pr(D>s, T>t)\]
    • Identifiable region: \(\{(s, t):0\leq t\leq s<\infty\}\)
      • Possible nonfatal \(\to\) death, not vice versa

Bivariate Analysis

  • Marginal models
    • Death: univariate event (KM, log-rank, Cox)
    • Nonfatal event
      • Cause-specific risk (treating death as censoring)
      • cumulative incidence (Gray, FG)
  • German breast cancer study
    • Nonfatal event: relapse of cancer (status == 1)
    • Death: overall mortality (status == 2)
gbc <- read.table("Data//German Breast Cancer Study//gbc.txt") |>
   mutate(age40 = (age > 40) + 0) # create age40: I(age > 40)

Bivariate Analysis - GBC Example (I)

  • Data
gbc
#    id      time status hormone age meno size grade nodes prog estrg age40
# 1   1 43.836066      1       1  38    1   18     3     5 1.41  1.05     0
# 2   1 74.819672      0       1  38    1   18     3     5 1.41  1.05     0
# 5   3 41.934426      1       1  47    1   30     2     1 4.22  0.89     1
# 6   3 47.737705      2       1  47    1   30     2     1 4.22  0.89     1
  • Analysis
    • Death (subset to status != 1)
      • Standard Cox: coxph(Surv(time, status == 2) ~ covariates)
    • Relapse (subset to first row for each id)
      • Cause-specific hazard: coxph(Surv(time, status == 1) ~ covariates)
      • Sub-distribution hazard: tidycmprk::crr(Surv(time, status) ~ covariates, failcode = "1")

Bivariate Analysis - GBC Example (II)

  • Regression results
    • Relapse: cause-specific \(\approx\) sub-distribution—few deaths (21) before relapse
    • Age effect: driven by relapse

Recurrent Events

  • Outcome data: \(\{N^*(\cdot), D\}\)
    • \(N^*(t)\): number of recurrent events in presence of death
      • Repeated tumor occurrences/hospitalizations before death (\(\dd N^*(t)\equiv 0\) for \(t>D\))
    • Treating \(D\) as censoring (AG/LWYY) \(\to\) cause-specific event rate \[E\{\dd N^*(t)\mid D\geq t\}\]
      • Incidence rate in survivors
  • Cumulative frequency
    • Mean function in overall population, dead or alive \[ \mu(t)=E\{N^*(t)\} \]
    • Extension of cumulative incidence

Marginal Approaches

  • Methods for cumulative frequency
    • Gray-type nonparametric estimator/test (Ghosh and Lin, 2000)
    • Proportional CF model (Fine-Gray-type) (Ghosh and Lin, 2002) \[\begin{equation}\label{eq:cmpr:pcf} E\{N^*(t)\mid Z\}=\exp(\beta^\T Z)\mu_0(t) \end{equation}\]
    • Higher death rate can reduce CF
  • Joint analysis with mortality
    • Ghosh-Lin for recurrent events \(+\) Cox/log-rank for death
    • \(\chi^2\) test with 2 d.f.
  • Alternative: shared-frailty models

Software: rccf2::rccf()

  • Two-sample estimation/testing of CF
devtools::install_github("lmaowisc/rccf2")
library(rccf2) # Load package
obj <- rccf(id, time, status, trt) # Fit model
  • Input
    • id: subject ID; trt: binary treatment group
    • time: event time; status: event status (1: nonfatal; 2: death; 0: censoring)
  • Output
    • obj$u1, obj$u2: Estimates of group-specific \(\mu(t)\) as a function of \(t\) (obj$t).
    • obj$pLR, obj$pD, obj$pjnt: \(p\)-values for tests on \(\mu(t)\), death, and jointly

Example: Bladder Tumor Study (I)

  • Bladder cancer trial (Byar, 1980)
    • Treatment groups: thiotepa (\(n=38\)) vs placebo (\(n=48\))
    • Endpoints: tumor recurrences and death

Example: Bladder Tumor Study (II)

  • Survival and cumulative frequency
    • Joint \(\chi_2^2\) test: \(p\)-value 0.184

Conclusion

Notes

  • Fine–Gray regression

    • Model diagnostics: R package crskdiag
    • Proportional sub-distribution odds model (Eriksson et al., 2015)
    library(timereg)
    # Fit a proportional sub-distribution odds model for the kth risk
    obj_po <- prop.odds.subdist(Event(time, status) ˜ covariates, cause = k)
    • Interval-censored competing risks (Mao et al., 2018)
  • Semi-competing risks

    • Term coined by Fine, Jiang, and Chappell (2001)
    • Multivariate (here), multistate (Chapter 12), and composite (Chapter 13)
    • Nonfatal event as mediator of treatment effect on death (Huang, 2021, 2022)

Summary

  • Competing risks
    • Cause-specific hazard: conditional failure rate on survivors
      • coxph(Surv(time, status == k) ~ covariates)
    • Cumulative incidence: marginal failure probability under other risks
      • Gray’s estimator/test cmprk::cuminc(ftime, fstatus, group, strata, rho = 0)
      • Fine-Gray model cmprk::crr(ftime, fstatus, cov1, failcode = k)
      • Tidy tables and graphics: tidycmprk package
  • Recurrent events in presence of death
    • Ghosh-Lin methods
      • Nonparametric: rccf2::rccf(id, time, status, trt)
      • Proportional CF model: R packages mets and reReg