Chapter 10 - Competing and Semi-Competing Risks

Slides

Lecture slides here. (To convert html to pdf, press E \(\to\) Print \(\to\) Destination: Save to pdf)

Chapter Summary

When a subject can experience at most one event among multiple competing risks (e.g., distinct causes of death) or a non-terminal event followed by a terminal event (semi-competing risks), standard survival analysis techniques need adjustments to account for the terminality of all or a subset of the events. These adjustments typically involve defining, modeling, and interpreting the cause-specific hazard, cumulative incidence, and/or sub-distribution hazards functions for competing risks, along with adaptations to semi-competing scenarios.

Competing risks

In competing risks, each subject can fail at most once from a single cause. The outcome data thus consist of the failure time \(T\) along with failure cause \(\Delta=1,\ldots, K\).

  • Cause-specific hazard (CSH)
    \[ \mathrm{d}\Lambda_k^{\rm c}(t) \;=\; \Pr\bigl(t \le T < t+\mathrm{d}t,\; \Delta=k \,\mid\, T \ge t\bigr), \] i.e., the conditional incidence of cause \(k\) among survivors of any cause.

    • Estimation: Treat other causes as censoring and fit a standard Cox model.
    • Interpretation: Covariate effects reflect changes in risk conditional on no failure so far.
  • Cumulative Incidence Function (CIF)
    \[ F_k(t) \;=\; \Pr(T \le t,\;\Delta = k), \] the marginal probability of having cause \(k\) by time \(t\).

    • Nonparametric: Gray’s estimator integrates the cause-specific hazard with overall survival.
    • Testing: Gray’s log-rank–type tests compare \(F_k(t)\) across groups.
    • Sub-distribution hazard: A transformation of CIF: \(\Lambda_k(t)=-\log\{1 - F_k(t)\}\).

    The Fine–Gray model specifies a multiplicative structure on this sub-distribution hazard: \[ \mathrm{d}\Lambda_k(t\mid Z) \;=\; \exp(\beta_k^\mathrm{T} Z)\,\mathrm{d}\Lambda_{k0}(t), \]

    • \(\beta_k\): log-sub-distribution hazard ratios associated with unit increases in the covariates.
  • Relationship between CSH and CIF (continuous): \[\begin{align} \mathrm{d}\Lambda_k^{\rm c}(t) &\;=\; \frac{ \mathrm{d} F_k(t)}{1 - \sum_{k'=1}^K F_{k'}(t)};\\ \mathrm{d} F_k(t) &\;=\; \exp\left\{-\sum_{k'=1}^K \Lambda_{k'}^{\rm c}(t)\right\}\mathrm{d}\Lambda_k^{\rm c}(t). \end{align}\]

Semi-Competing Risks

Semi-competing risks arise when a non-terminal event (e.g., relapse) can only be observed if it occurs before the terminal event (e.g., death). Once the terminal event happens, the non-terminal event cannot.

  • Two endpoints \((T, D)\): \(T\) = non-terminal, \(D\) = terminal, partially ordered by \(T \le D\) if \(T\) occurs.
  • Recurrent events with death: \(N^*(t)\) = number of non-terminal events by time \(t\), with \(\mathrm{d} N^*(t)\equiv 0\) for \(t> D\) (no event after death).

Analyses can be:

  • Cause-specific: treat death as censoring for the non-terminal event.
  • Sub-distribution: treat death as a competing risk for the non-terminal event (acknowledging that there is no event after death).
  • Extended frameworks: (multistate or shared frailty) for more detailed joint modeling.

In particular, the Ghosh-Lin test is a two-sample test for comparing the cumulative frequency \(E\{N^*(t)\}\), accounting for the terminal event. The proportional cumulative frequency regression extends this to covariate-adjusted models: \[ E\{N^*(t)\mid Z\}\;=\;\exp(\beta^\mathrm{T} Z)\mu_0(t). \]

Example R code

Below are code snippets demonstrating nonparametric estimation and regression for competing (and semi-competing) risks. Assume a data frame df with:

  • time: observed time to event or censor,
  • status: integer-coded event indicator (0 = censor, 1, 2 for distinct causes),
  • covariates: x1, x2, etc.
########################################
# 1. Proportional cause-specific hazards
########################################
library(survival)
# cause 1 vs. not 1
fit_cs <- coxph(Surv(time, status == 1) ~ x1 + x2, data = df)
summary(fit_cs)

########################################
# 2. Nonparametric CIF (Gray’s method)
########################################
# Using 'cmprsk' to estimate and test cumulative incidence
library(cmprsk)
# Analyses all risks encoded in fstatus
obj_cif <- cuminc(ftime = df$time,
                  fstatus = df$status,
                  group   = df$group   # optional grouping
                  )
obj_cif          # prints CIF estimates & Gray's test p-values
plot(obj_cif)    # basic CIF plot

####################################################
# 3. Fine–Gray proportional sub-distribution hazards
####################################################
library(cmprsk)
# Covariate matrix
cov_mat <- model.matrix(~ x1 + x2, data = df)[, -1]
# cause of interest is 1
fit_fg <- crr(ftime = df$time,
              fstatus = df$status,
              cov1     = cov_mat,
              failcode = 1,
              cencode  = 0)
summary(fit_fg)

####################################################
# 4. Ghosh-Lin methods for cumulative frequency
####################################################
devtools::install_github("lmaowisc/rccf2")
library(rccf2)
# Fit the Ghosh-Lin test for recurrent events
# status = 1: recurrent event; 2: death; 0: censoring
obj_rccf <- rccf(id = df$id, # subject ID
                 time = df$time, # event time
                 status = df$status, # event status
                 trt = df$trt # treatment group
                 )

obj_rccf # print results

Conclusion

When analyzing competing risks, the cause-specific approach (treating other causes as censoring) targets the conditional hazard among survivors, while the cumulative incidence or sub-distribution hazard approach (Fine–Gray) provides a marginal interpretation of failure probability in the presence of competing events. For semi-competing risks—where a non-terminal event cannot follow a terminal event—researchers can treat the terminal event as a competing risk for the non-terminal one or use specialized multistate/frailty methods to capture joint behavior. Each method yields valid but different insights: cause-specific hazards focus on event rates among those alive, while cumulative incidence or sub-distribution models offer population-level probabilities.

R Code

Show the code
###############################################################################
# Chapter 10 R Code
#
# This script reproduces all major numerical results in Chapter 10, including:
#   1. Bone Marrow Transplantation Study (Competing risks)
#   2. Semi-competing risks in the German Breast Cancer (GBC) data
#   3. Bladder tumor study (recurrent events with death as a terminal event)
###############################################################################

#==============================================================================
# (A) Bone Marrow Transplantation Study
#==============================================================================
library(survival)     # For survival analysis (cause-specific hazards, etc.)
library(cmprsk)       # For Fine-Gray and Gray’s test
library(tidyverse)    # For data manipulation and plotting

#------------------------------------------------------------------------------
# 1. Read Data and Perform Gray's Test
#------------------------------------------------------------------------------
cibmtr <- read.table("Data//Bone Marrow Transplantation Study//cibmtr.txt")
head(cibmtr)

# Gray's unweighted log-rank-type test for group differences in 
# cumulative incidence. 'rho=0' => unweighted test.
obj <- cmprsk::cuminc(cibmtr$time, cibmtr$status, cibmtr$donor, rho = 0)

# Print test results (Gray’s test statistics and p-values)
obj$Tests

#------------------------------------------------------------------------------
# 2. Plot the Cumulative Incidence Functions (Figure 10.2)
#   - Relapse (k=1)
#   - Treatment-related Mortality (k=2)
#------------------------------------------------------------------------------
# Extract group-specific CIF objects for siblings (donor=0) vs. unrelated (donor=1)
# k=1: relapse, k=2: TRM
obj.rlp.sib    <- obj$`0 1`  # Relapse, siblings
obj.rlp.nonsib <- obj$`1 1`  # Relapse, unrelated
obj.trm.sib    <- obj$`0 2`  # TRM, siblings
obj.trm.nonsib <- obj$`1 2`  # TRM, unrelated

# Set up two-panel plot
par(mfrow = c(1, 2))

# Left panel: Relapse
plot(
  obj.rlp.sib$time, obj.rlp.sib$est,
  type      = "s",          # Step function
  frame.plot= FALSE,
  main      = "Relapse",
  xlim      = c(0, 120),
  ylim      = c(0, 0.6),
  xlab      = "Time (months)",
  ylab      = "Cumulative incidence",
  lwd       = 2
)
lines(obj.rlp.nonsib$time, obj.rlp.nonsib$est, lty = 3, lwd = 2)

# Right panel: TRM
plot(
  obj.trm.sib$time, obj.trm.sib$est,
  type      = "s",
  frame.plot= FALSE,
  main      = "Treatment-related mortality",
  xlim      = c(0, 120),
  ylim      = c(0, 0.6),
  xlab      = "Time (months)",
  ylab      = "Cumulative incidence",
  lwd       = 2
)
lines(obj.trm.nonsib$time, obj.trm.nonsib$est, lty = 3, lwd = 2)

#------------------------------------------------------------------------------
# 3. Fine-Gray vs. Cause-Specific Hazard Models 
#------------------------------------------------------------------------------
# We focus on cause k = 1 (Relapse)

# Fine-Gray model
obj.fg <- cmprsk::crr(cibmtr$time, cibmtr$status, cibmtr[, 3:6], failcode = 1)

# Extract estimates and standard errors
beta.fg <- obj.fg$coef
se.fg   <- sqrt(diag(obj.fg$var))

# Format hazard ratios (HR), confidence intervals, and p-values
c1 <- round(exp(beta.fg), 2)
c2 <- paste0("(", round(exp(beta.fg - 1.96 * se.fg), 2), "-", round(exp(beta.fg + 1.96 * se.fg), 2), ")")
c3 <- round(1 - pchisq((beta.fg / se.fg)^2, 1), 3)

# Cause-specific hazard model for k=1
obj.csh <- coxph(Surv(time, status == 1) ~ cohort + donor + hist + wait, data = cibmtr)

beta.csh <- obj.csh$coef
se.csh   <- sqrt(diag(obj.csh$var))

c4 <- round(exp(beta.csh), 2)
c5 <- paste0("(", round(exp(beta.csh - 1.96 * se.csh), 2), "-", round(exp(beta.csh + 1.96 * se.csh), 2), ")")
c6 <- round(1 - pchisq((beta.csh / se.csh)^2, 1), 3)

# Print combined Fine-Gray (columns 1-3) vs. cause-specific hazard (columns 4-6)
noquote(cbind(c1, c2, c3, c4, c5, c6))

#------------------------------------------------------------------------------
# 5. Covariate-Specific CIF Prediction
#------------------------------------------------------------------------------
# Example covariate vector: z = (1, 0, 1, 0)
z <- c(1, 0, 1, 0)

# Method 1: Using predict.crr()
obj_pred1 <- predict(obj.fg, z)

# Method 2: Manual calculation
beta   <- obj.fg$coef
Lambda <- cumsum(obj.fg$bfitj)  # Baseline integrated sub-distribution hazard
time   <- obj.fg$uftime
cif    <- 1 - exp(-exp(sum(beta * z)) * Lambda)
obj_pred2 <- cbind(time, cif)

# Compare both predictions (they should match closely)
cbind(obj_pred1, obj_pred2)


#==============================================================================
# (B) Tidy Competing Risks Analysis
#==============================================================================
library(tidycmprsk)  # For cuminc() wrappers & table building
library(ggsurvfit)   # For enhanced survival/cif plots

# Recode the data for use with tidycmprsk
df <- cibmtr %>%
  dplyr::mutate(
    status = factor(status, levels = c("0", "1", "2"), labels = c("Censored", "Relapse", "TRM")),
    donor  = factor(donor, levels = c("0", "1"), labels = c("Id. sibling", "Unrelated"))
  )

# Fit CIF by donor groups using tidycmprsk
obj_cif <- tidycmprsk::cuminc(Surv(time, status) ~ donor, data = df)
obj_cif

# Summaries of CIF at specific times (12, 48, 84, 120 months)
tbl_surv <- tbl_cuminc(
  x          = obj_cif,
  times      = seq(12, 120, by = 36),
  outcomes   = c("Relapse", "TRM"),
  label_header = "{time} months",
  label        = "Donor"
)
tbl_surv

# Base CIF plot for both outcomes (Relapse, TRM) in one panel
obj_cif %>%
  ggcuminc(outcome = c("Relapse", "TRM")) +
  scale_x_continuous("Time (months)", limits = c(0, 120), breaks = seq(0, 120, 24)) +
  add_risktable(risktable_stats = "n.risk") +
  theme_minimal()

#------------------------------------------------------------------------------
# 1. Separate Plots by Risk (Patchwork)
#------------------------------------------------------------------------------
library(patchwork)
library(ggsci)

# Function to plot a single outcome's CIF
plot_bm_cif <- function(obj_cif, outcome) {
  p <- ggcuminc(obj_cif, outcome, linetype_aes = TRUE, linewidth = 0.8) +
    add_risktable(
      risktable_stats = "n.risk",
      theme = list(theme_risktable_default())
    ) +
    ggtitle(outcome) +
    scale_y_continuous(
      "Cumulative incidence",
      limits = c(0, 0.6),
      breaks = seq(0, 1, by = 0.1),
      expand = expansion(c(0, 0.005))
    ) +
    scale_x_continuous("Time (months)", limits = c(0, 120), breaks = seq(0, 120, 24)) +
    scale_color_jama() +                # JAMA color palette
    scale_linetype_manual(values = c(2, 1)) +
    theme_classic() +
    theme(
      plot.margin        = margin(0, 0, 0, 0),
      legend.position    = "top",
      legend.key.width   = unit(1, "cm"),
      panel.grid.major.y = element_line(),
      legend.text        = element_text(size = 10),
      plot.caption       = element_text(size = 10),
      plot.title         = element_text(size = 12)
    )
  p
}

# Two outcome-specific panels
rel_fig <- plot_bm_cif(obj_cif, "Relapse")
trm_fig <- plot_bm_cif(obj_cif, "TRM")

# Combine into one figure
bm_fig <- wrap_plots(
  ggsurvfit_build(rel_fig),
  ggsurvfit_build(trm_fig),
  ncol = 2
)
bm_fig

# Save if needed:
# ggsave("images/cmpr_bm.png", bm_fig, width = 8, height = 4.5)
# ggsave("images/cmpr_bm.eps", bm_fig, width = 8, height = 4.5)

# Fit separate Fine-Gray models for relapse vs. TRM
obj_rel <- tidycmprsk::crr(Surv(time, status) ~ donor + cohort + hist + wait, data = df, failcode = "Relapse")
obj_rel
obj_trm <- tidycmprsk::crr(Surv(time, status) ~ donor + cohort + hist + wait, data = df, failcode = "TRM")
obj_trm

# Example: subdistribution odds via timereg::prop.odds.subdist
library(timereg)
obj_po1 <- prop.odds.subdist(
  Event(time, status) ~ donor + cohort + hist + wait,
  data  = df,
  cause = 1
)
obj_po1$gamma
obj_po1$robvar.gamma


#==============================================================================
# (C) Semi-Competing Risks in the German Breast Cancer Data
#==============================================================================
library(gtsummary)
library(labelled)

#------------------------------------------------------------------------------
# 1. Read and Recoding GBC Data
#------------------------------------------------------------------------------
gbc <- read.table("Data//German Breast Cancer Study//gbc.txt") %>%
  mutate(
    age40 = (age > 40) + 0,        # Binary for age>40
    grade = factor(grade),
    prog  = prog / 100,           # Rescale progesterone
    estrg = estrg / 100           # Rescale estrogen
  )

# Assign human-friendly variable labels for gtsummary
var_label(gbc) <- list(
  hormone = "Hormone",
  meno    = "Menopause",
  age40   = "Age 40+",
  grade   = "Tumor grade",
  size    = "Tumor size (mm)",
  prog    = "Progesterone (100 fmol/mg)",
  estrg   = "Estrogen (100 fmol/mg)"
)

#------------------------------------------------------------------------------
# 2. Cox Model for Overall Survival (Death)
#------------------------------------------------------------------------------
# Exclude those who relapsed only (status=1), keep death (status=2) and censoring (status=0)
gbc_death <- gbc %>%
  filter(status != 1) %>%
  mutate(status = (status > 0))  # status=1 if death, 0 if censor

obj_death <- coxph(
  Surv(time, status) ~ hormone + meno + age40 + grade + size + prog + estrg,
  data = gbc_death
)
death_tbl <- tbl_regression(obj_death, exponentiate = TRUE)

#------------------------------------------------------------------------------
# 3. Cause-Specific Hazards for Relapse
#------------------------------------------------------------------------------
# Keep only first event for each subject
# If status=1 => relapse, else censor
# (Ignoring death, which is status=2)
gbc_relc <- gbc %>%
  group_by(id) %>%
  slice_min(time) %>%
  mutate(status = if_else(status == 2, 0, status)) %>%
  slice_max(status) %>%
  ungroup()

obj_relc <- coxph(
  Surv(time, status) ~ hormone + meno + age40 + grade + size + prog + estrg,
  data = gbc_relc
)
relc_tbl <- tbl_regression(obj_relc, exponentiate = TRUE)

#------------------------------------------------------------------------------
# 4. Subdistribution Hazards for Relapse (Fine-Gray)
#------------------------------------------------------------------------------
# Keep only the first event, re-encode status as factor
gbc_tfe <- gbc %>%
  group_by(id) %>%
  slice_min(time) %>%
  slice_max(status) %>%
  mutate(status = factor(status, levels = c("0", "1", "2"))) %>%
  ungroup()

obj_rel_sub <- tidycmprsk::crr(
  Surv(time, status) ~ hormone + meno + age40 + grade + size + prog + estrg,
  data     = gbc_tfe,
  failcode = "1"
)
rel_sub_tbl <- tbl_regression(obj_rel_sub, exponentiate = TRUE)

# Merge three tables: Death, Relapse (Cause-Specific), Relapse (Subdistribution)
tbl <- tbl_merge(
  tbls = list(death_tbl, relc_tbl, rel_sub_tbl),
  tab_spanner = c("**Death**", "**Relapse (cause-specific)**", "**Relapse (subdistribution)**")
)
tbl
# as_kable_extra(tbl, format = "latex")


#==============================================================================
# (D) Bladder Tumor Study (Recurrent Events with Death as Terminal Event)
#==============================================================================
devtools::install_github("lmaowisc/rccf2")
library(rccf2)
data(bladder)

#------------------------------------------------------------------------------
# 1. Descriptive Analysis
#------------------------------------------------------------------------------
# Count how many rows belong to each status
bladder %>%
  count(status)

# Prepare data for plotting
df <- bladder %>%
  left_join(
    bladder %>% group_by(id) %>% summarize(max_fu = max(time)),
    by = "id"
  ) %>%
  mutate(
    id     = factor(id),
    status = factor(status, levels = c("0", "1", "2")),  # 0=censored,1=recur,2=death
    trt    = if_else(trt == 1, "Thiotepa", "Placebo")
  )

dfmax <- df %>%
  group_by(id, trt) %>%
  summarize(max_fu = max(time), .groups = "drop")

# Follow-up plot showing recurrences/death
df %>%
  ggplot(aes(y = reorder(id, max_fu))) +
  geom_linerange(data = dfmax, aes(xmin = 0, xmax = max_fu)) +
  geom_point(aes(x = time, shape = status), size = 2, fill = "white") +
  geom_vline(xintercept = 0, linewidth = 1) +
  scale_shape_manual(values = c(23, 15, 19), labels = c("Censoring", "Tumor recurrence", "Death")) +
  scale_x_continuous(
    "Time (months)",
    limits = c(0, 65),
    breaks = seq(0, 60, by = 12),
    expand = c(0, 0.5)
  ) +
  scale_y_discrete("Patients") +
  facet_wrap(~ trt, scales = "free") +
  theme_minimal() +
  theme(
    legend.position      = "top",
    legend.title         = element_blank(),
    panel.grid.minor.x   = element_blank(),
    axis.text.y          = element_blank(),
    axis.ticks.y         = element_blank(),
    panel.grid.major.y   = element_blank()
  )

# ggsave("cmpr_blad_fu.pdf", width = 8, height = 8)
# ggsave("cmpr_blad_fu.eps", width = 8, height = 8)

#------------------------------------------------------------------------------
# 2. Ghosh-Lin Two-Sample Analysis
#------------------------------------------------------------------------------
id     <- bladder$id
time   <- bladder$time
status <- bladder$status
trt    <- bladder$trt

# Fit the Ghosh-Lin test for recurrent events
obj_rccf <- rccf(id, time, status, trt)
stat  <- obj_rccf$stat  # test statistics
S     <- obj_rccf$S     # covariance matrix
u1    <- obj_rccf$u1    # mean function for group 1
u2    <- obj_rccf$u2    # mean function for group 2
t_rccf<- obj_rccf$t     # time points used

# Derive partial tests
QLR <- stat[1] / sqrt(S[1, 1])   # Q-value for recurrence
pLR <- 1 - pchisq(QLR^2, 1)

QD <- stat[2] / sqrt(S[2, 2])    # Q-value for death
pD <- 1 - pchisq(QD^2, 1)

Q  <- t(stat) %*% solve(S) %*% stat  # joint test
p  <- 1 - pchisq(Q, 2)

#------------------------------------------------------------------------------
# 3. Plot Estimated Cumulative Frequency (CF) (Figure 10.x)
#------------------------------------------------------------------------------
par(mfrow = c(1, 2))

# Left panel: Survival from death (0=alive, >0=dead)
fatal <- bladder[status != 1, ]
obj_sf <- survfit(Surv(time, status > 0) ~ trt, data = fatal)

plot(
  obj_sf,
  lty      = c(1, 3),
  frame    = FALSE,
  xlim     = c(0, 50),
  lwd      = 2,
  cex.lab  = 1.5,
  cex.axis = 1.5,
  xlab     = "Time (months)",
  ylab     = "Survival probabilities"
)
text(35, 0.14, paste0("Log-rank test: p=", round(pD, 3)), cex = 1.2)

# Right panel: Mean functions for tumor recurrences
stepmu1 <- stepfun(t_rccf, c(0, u1))
stepmu2 <- stepfun(t_rccf, c(0, u2))

plot(
  stepmu2,
  do.points= FALSE,
  xlab     = "Time (months)",
  ylab     = "Cumulative tumor frequency",
  main = "",
  xlim     = c(0, 60),
  ylim     = c(0, 3),
  frame    = FALSE,
  lwd      = 2,
  cex.lab  = 1.5,
  cex.axis = 1.5,
  lty      = 3
)
plot(
  stepmu1,
  add       = TRUE,
  lty       = 1,
  do.points = FALSE,
  lwd       = 2
)
text(40, 0.4, paste0("Ghosh-Lin test: p=", round(pLR, 3)), cex = 1.2)

#------------------------------------------------------------------------------
# 4. Ghosh-Lin Proportional CF Regression (mets::recreg)
#------------------------------------------------------------------------------
library(mets)
data(bladder)

# Create start-stop format for each ID
df_gl <- bladder %>%
  group_by(id) %>%
  mutate(
    start = lag(time, default = 0),
    stop  = if_else(time == 0, 1e-4, time)
  ) %>%
  select(id, start, stop, status, trt) %>%
  ungroup() %>%
  as.data.frame()

# Fit the proportional cumulative frequency regression
obj_reg <- recreg(
  Event(start, stop, status) ~ trt + cluster(id),
  cause      = 1,    # Recurrent event
  death.code = 2,    # Terminal event
  data       = df_gl
)
summary(obj_reg)