########################################
# 1. Proportional cause-specific hazards
########################################
library(survival)
# cause 1 vs. not 1
<- coxph(Surv(time, status == 1) ~ x1 + x2, data = df)
fit_cs summary(fit_cs)
########################################
# 2. Nonparametric CIF (Gray’s method)
########################################
# Using 'cmprsk' to estimate and test cumulative incidence
library(cmprsk)
# Analyses all risks encoded in fstatus
<- cuminc(ftime = df$time,
obj_cif fstatus = df$status,
group = df$group # optional grouping
)# prints CIF estimates & Gray's test p-values
obj_cif plot(obj_cif) # basic CIF plot
####################################################
# 3. Fine–Gray proportional sub-distribution hazards
####################################################
library(cmprsk)
# Covariate matrix
<- model.matrix(~ x1 + x2, data = df)[, -1]
cov_mat # cause of interest is 1
<- crr(ftime = df$time,
fit_fg fstatus = df$status,
cov1 = cov_mat,
failcode = 1,
cencode = 0)
summary(fit_fg)
####################################################
# 4. Ghosh-Lin methods for cumulative frequency
####################################################
::install_github("lmaowisc/rccf2")
devtoolslibrary(rccf2)
# Fit the Ghosh-Lin test for recurrent events
# status = 1: recurrent event; 2: death; 0: censoring
<- rccf(id = df$id, # subject ID
obj_rccf time = df$time, # event time
status = df$status, # event status
trt = df$trt # treatment group
)
# print results obj_rccf
Chapter 10 - Competing and Semi-Competing Risks
Slides
Lecture slides here. (To convert html to pdf, press E \(\to\) Print \(\to\) Destination: Save to pdf)
Chapter Summary
When a subject can experience at most one event among multiple competing risks (e.g., distinct causes of death) or a non-terminal event followed by a terminal event (semi-competing risks), standard survival analysis techniques need adjustments to account for the terminality of all or a subset of the events. These adjustments typically involve defining, modeling, and interpreting the cause-specific hazard, cumulative incidence, and/or sub-distribution hazards functions for competing risks, along with adaptations to semi-competing scenarios.
Competing risks
In competing risks, each subject can fail at most once from a single cause. The outcome data thus consist of the failure time \(T\) along with failure cause \(\Delta=1,\ldots, K\).
Cause-specific hazard (CSH)
\[ \mathrm{d}\Lambda_k^{\rm c}(t) \;=\; \Pr\bigl(t \le T < t+\mathrm{d}t,\; \Delta=k \,\mid\, T \ge t\bigr), \] i.e., the conditional incidence of cause \(k\) among survivors of any cause.- Estimation: Treat other causes as censoring and fit a standard Cox model.
- Interpretation: Covariate effects reflect changes in risk conditional on no failure so far.
- Estimation: Treat other causes as censoring and fit a standard Cox model.
Cumulative Incidence Function (CIF)
\[ F_k(t) \;=\; \Pr(T \le t,\;\Delta = k), \] the marginal probability of having cause \(k\) by time \(t\).- Nonparametric: Gray’s estimator integrates the cause-specific hazard with overall survival.
- Testing: Gray’s log-rank–type tests compare \(F_k(t)\) across groups.
- Sub-distribution hazard: A transformation of CIF: \(\Lambda_k(t)=-\log\{1 - F_k(t)\}\).
The Fine–Gray model specifies a multiplicative structure on this sub-distribution hazard: \[ \mathrm{d}\Lambda_k(t\mid Z) \;=\; \exp(\beta_k^\mathrm{T} Z)\,\mathrm{d}\Lambda_{k0}(t), \]
- \(\beta_k\): log-sub-distribution hazard ratios associated with unit increases in the covariates.
- Nonparametric: Gray’s estimator integrates the cause-specific hazard with overall survival.
Relationship between CSH and CIF (continuous): \[\begin{align} \mathrm{d}\Lambda_k^{\rm c}(t) &\;=\; \frac{ \mathrm{d} F_k(t)}{1 - \sum_{k'=1}^K F_{k'}(t)};\\ \mathrm{d} F_k(t) &\;=\; \exp\left\{-\sum_{k'=1}^K \Lambda_{k'}^{\rm c}(t)\right\}\mathrm{d}\Lambda_k^{\rm c}(t). \end{align}\]
Semi-Competing Risks
Semi-competing risks arise when a non-terminal event (e.g., relapse) can only be observed if it occurs before the terminal event (e.g., death). Once the terminal event happens, the non-terminal event cannot.
- Two endpoints \((T, D)\): \(T\) = non-terminal, \(D\) = terminal, partially ordered by \(T \le D\) if \(T\) occurs.
- Recurrent events with death: \(N^*(t)\) = number of non-terminal events by time \(t\), with \(\mathrm{d} N^*(t)\equiv 0\) for \(t> D\) (no event after death).
Analyses can be:
- Cause-specific: treat death as censoring for the non-terminal event.
- Sub-distribution: treat death as a competing risk for the non-terminal event (acknowledging that there is no event after death).
- Extended frameworks: (multistate or shared frailty) for more detailed joint modeling.
In particular, the Ghosh-Lin test is a two-sample test for comparing the cumulative frequency \(E\{N^*(t)\}\), accounting for the terminal event. The proportional cumulative frequency regression extends this to covariate-adjusted models: \[ E\{N^*(t)\mid Z\}\;=\;\exp(\beta^\mathrm{T} Z)\mu_0(t). \]
Example R code
Below are code snippets demonstrating nonparametric estimation and regression for competing (and semi-competing) risks. Assume a data frame df
with:
time
: observed time to event or censor,
status
: integer-coded event indicator (0 = censor, 1, 2 for distinct causes),
- covariates:
x1
,x2
, etc.
Conclusion
When analyzing competing risks, the cause-specific approach (treating other causes as censoring) targets the conditional hazard among survivors, while the cumulative incidence or sub-distribution hazard approach (Fine–Gray) provides a marginal interpretation of failure probability in the presence of competing events. For semi-competing risks—where a non-terminal event cannot follow a terminal event—researchers can treat the terminal event as a competing risk for the non-terminal one or use specialized multistate/frailty methods to capture joint behavior. Each method yields valid but different insights: cause-specific hazards focus on event rates among those alive, while cumulative incidence or sub-distribution models offer population-level probabilities.
R Code
Show the code
###############################################################################
# Chapter 10 R Code
#
# This script reproduces all major numerical results in Chapter 10, including:
# 1. Bone Marrow Transplantation Study (Competing risks)
# 2. Semi-competing risks in the German Breast Cancer (GBC) data
# 3. Bladder tumor study (recurrent events with death as a terminal event)
###############################################################################
#==============================================================================
# (A) Bone Marrow Transplantation Study
#==============================================================================
library(survival) # For survival analysis (cause-specific hazards, etc.)
library(cmprsk) # For Fine-Gray and Gray’s test
library(tidyverse) # For data manipulation and plotting
#------------------------------------------------------------------------------
# 1. Read Data and Perform Gray's Test
#------------------------------------------------------------------------------
<- read.table("Data//Bone Marrow Transplantation Study//cibmtr.txt")
cibmtr head(cibmtr)
# Gray's unweighted log-rank-type test for group differences in
# cumulative incidence. 'rho=0' => unweighted test.
<- cmprsk::cuminc(cibmtr$time, cibmtr$status, cibmtr$donor, rho = 0)
obj
# Print test results (Gray’s test statistics and p-values)
$Tests
obj
#------------------------------------------------------------------------------
# 2. Plot the Cumulative Incidence Functions (Figure 10.2)
# - Relapse (k=1)
# - Treatment-related Mortality (k=2)
#------------------------------------------------------------------------------
# Extract group-specific CIF objects for siblings (donor=0) vs. unrelated (donor=1)
# k=1: relapse, k=2: TRM
<- obj$`0 1` # Relapse, siblings
obj.rlp.sib <- obj$`1 1` # Relapse, unrelated
obj.rlp.nonsib <- obj$`0 2` # TRM, siblings
obj.trm.sib <- obj$`1 2` # TRM, unrelated
obj.trm.nonsib
# Set up two-panel plot
par(mfrow = c(1, 2))
# Left panel: Relapse
plot(
$time, obj.rlp.sib$est,
obj.rlp.sibtype = "s", # Step function
frame.plot= FALSE,
main = "Relapse",
xlim = c(0, 120),
ylim = c(0, 0.6),
xlab = "Time (months)",
ylab = "Cumulative incidence",
lwd = 2
)lines(obj.rlp.nonsib$time, obj.rlp.nonsib$est, lty = 3, lwd = 2)
# Right panel: TRM
plot(
$time, obj.trm.sib$est,
obj.trm.sibtype = "s",
frame.plot= FALSE,
main = "Treatment-related mortality",
xlim = c(0, 120),
ylim = c(0, 0.6),
xlab = "Time (months)",
ylab = "Cumulative incidence",
lwd = 2
)lines(obj.trm.nonsib$time, obj.trm.nonsib$est, lty = 3, lwd = 2)
#------------------------------------------------------------------------------
# 3. Fine-Gray vs. Cause-Specific Hazard Models
#------------------------------------------------------------------------------
# We focus on cause k = 1 (Relapse)
# Fine-Gray model
<- cmprsk::crr(cibmtr$time, cibmtr$status, cibmtr[, 3:6], failcode = 1)
obj.fg
# Extract estimates and standard errors
<- obj.fg$coef
beta.fg <- sqrt(diag(obj.fg$var))
se.fg
# Format hazard ratios (HR), confidence intervals, and p-values
<- round(exp(beta.fg), 2)
c1 <- paste0("(", round(exp(beta.fg - 1.96 * se.fg), 2), "-", round(exp(beta.fg + 1.96 * se.fg), 2), ")")
c2 <- round(1 - pchisq((beta.fg / se.fg)^2, 1), 3)
c3
# Cause-specific hazard model for k=1
<- coxph(Surv(time, status == 1) ~ cohort + donor + hist + wait, data = cibmtr)
obj.csh
<- obj.csh$coef
beta.csh <- sqrt(diag(obj.csh$var))
se.csh
<- round(exp(beta.csh), 2)
c4 <- paste0("(", round(exp(beta.csh - 1.96 * se.csh), 2), "-", round(exp(beta.csh + 1.96 * se.csh), 2), ")")
c5 <- round(1 - pchisq((beta.csh / se.csh)^2, 1), 3)
c6
# Print combined Fine-Gray (columns 1-3) vs. cause-specific hazard (columns 4-6)
noquote(cbind(c1, c2, c3, c4, c5, c6))
#------------------------------------------------------------------------------
# 5. Covariate-Specific CIF Prediction
#------------------------------------------------------------------------------
# Example covariate vector: z = (1, 0, 1, 0)
<- c(1, 0, 1, 0)
z
# Method 1: Using predict.crr()
<- predict(obj.fg, z)
obj_pred1
# Method 2: Manual calculation
<- obj.fg$coef
beta <- cumsum(obj.fg$bfitj) # Baseline integrated sub-distribution hazard
Lambda <- obj.fg$uftime
time <- 1 - exp(-exp(sum(beta * z)) * Lambda)
cif <- cbind(time, cif)
obj_pred2
# Compare both predictions (they should match closely)
cbind(obj_pred1, obj_pred2)
#==============================================================================
# (B) Tidy Competing Risks Analysis
#==============================================================================
library(tidycmprsk) # For cuminc() wrappers & table building
library(ggsurvfit) # For enhanced survival/cif plots
# Recode the data for use with tidycmprsk
<- cibmtr %>%
df ::mutate(
dplyrstatus = factor(status, levels = c("0", "1", "2"), labels = c("Censored", "Relapse", "TRM")),
donor = factor(donor, levels = c("0", "1"), labels = c("Id. sibling", "Unrelated"))
)
# Fit CIF by donor groups using tidycmprsk
<- tidycmprsk::cuminc(Surv(time, status) ~ donor, data = df)
obj_cif
obj_cif
# Summaries of CIF at specific times (12, 48, 84, 120 months)
<- tbl_cuminc(
tbl_surv x = obj_cif,
times = seq(12, 120, by = 36),
outcomes = c("Relapse", "TRM"),
label_header = "{time} months",
label = "Donor"
)
tbl_surv
# Base CIF plot for both outcomes (Relapse, TRM) in one panel
%>%
obj_cif ggcuminc(outcome = c("Relapse", "TRM")) +
scale_x_continuous("Time (months)", limits = c(0, 120), breaks = seq(0, 120, 24)) +
add_risktable(risktable_stats = "n.risk") +
theme_minimal()
#------------------------------------------------------------------------------
# 1. Separate Plots by Risk (Patchwork)
#------------------------------------------------------------------------------
library(patchwork)
library(ggsci)
# Function to plot a single outcome's CIF
<- function(obj_cif, outcome) {
plot_bm_cif <- ggcuminc(obj_cif, outcome, linetype_aes = TRUE, linewidth = 0.8) +
p add_risktable(
risktable_stats = "n.risk",
theme = list(theme_risktable_default())
+
) ggtitle(outcome) +
scale_y_continuous(
"Cumulative incidence",
limits = c(0, 0.6),
breaks = seq(0, 1, by = 0.1),
expand = expansion(c(0, 0.005))
+
) scale_x_continuous("Time (months)", limits = c(0, 120), breaks = seq(0, 120, 24)) +
scale_color_jama() + # JAMA color palette
scale_linetype_manual(values = c(2, 1)) +
theme_classic() +
theme(
plot.margin = margin(0, 0, 0, 0),
legend.position = "top",
legend.key.width = unit(1, "cm"),
panel.grid.major.y = element_line(),
legend.text = element_text(size = 10),
plot.caption = element_text(size = 10),
plot.title = element_text(size = 12)
)
p
}
# Two outcome-specific panels
<- plot_bm_cif(obj_cif, "Relapse")
rel_fig <- plot_bm_cif(obj_cif, "TRM")
trm_fig
# Combine into one figure
<- wrap_plots(
bm_fig ggsurvfit_build(rel_fig),
ggsurvfit_build(trm_fig),
ncol = 2
)
bm_fig
# Save if needed:
# ggsave("images/cmpr_bm.png", bm_fig, width = 8, height = 4.5)
# ggsave("images/cmpr_bm.eps", bm_fig, width = 8, height = 4.5)
# Fit separate Fine-Gray models for relapse vs. TRM
<- tidycmprsk::crr(Surv(time, status) ~ donor + cohort + hist + wait, data = df, failcode = "Relapse")
obj_rel
obj_rel<- tidycmprsk::crr(Surv(time, status) ~ donor + cohort + hist + wait, data = df, failcode = "TRM")
obj_trm
obj_trm
# Example: subdistribution odds via timereg::prop.odds.subdist
library(timereg)
<- prop.odds.subdist(
obj_po1 Event(time, status) ~ donor + cohort + hist + wait,
data = df,
cause = 1
)$gamma
obj_po1$robvar.gamma
obj_po1
#==============================================================================
# (C) Semi-Competing Risks in the German Breast Cancer Data
#==============================================================================
library(gtsummary)
library(labelled)
#------------------------------------------------------------------------------
# 1. Read and Recoding GBC Data
#------------------------------------------------------------------------------
<- read.table("Data//German Breast Cancer Study//gbc.txt") %>%
gbc mutate(
age40 = (age > 40) + 0, # Binary for age>40
grade = factor(grade),
prog = prog / 100, # Rescale progesterone
estrg = estrg / 100 # Rescale estrogen
)
# Assign human-friendly variable labels for gtsummary
var_label(gbc) <- list(
hormone = "Hormone",
meno = "Menopause",
age40 = "Age 40+",
grade = "Tumor grade",
size = "Tumor size (mm)",
prog = "Progesterone (100 fmol/mg)",
estrg = "Estrogen (100 fmol/mg)"
)
#------------------------------------------------------------------------------
# 2. Cox Model for Overall Survival (Death)
#------------------------------------------------------------------------------
# Exclude those who relapsed only (status=1), keep death (status=2) and censoring (status=0)
<- gbc %>%
gbc_death filter(status != 1) %>%
mutate(status = (status > 0)) # status=1 if death, 0 if censor
<- coxph(
obj_death Surv(time, status) ~ hormone + meno + age40 + grade + size + prog + estrg,
data = gbc_death
)<- tbl_regression(obj_death, exponentiate = TRUE)
death_tbl
#------------------------------------------------------------------------------
# 3. Cause-Specific Hazards for Relapse
#------------------------------------------------------------------------------
# Keep only first event for each subject
# If status=1 => relapse, else censor
# (Ignoring death, which is status=2)
<- gbc %>%
gbc_relc group_by(id) %>%
slice_min(time) %>%
mutate(status = if_else(status == 2, 0, status)) %>%
slice_max(status) %>%
ungroup()
<- coxph(
obj_relc Surv(time, status) ~ hormone + meno + age40 + grade + size + prog + estrg,
data = gbc_relc
)<- tbl_regression(obj_relc, exponentiate = TRUE)
relc_tbl
#------------------------------------------------------------------------------
# 4. Subdistribution Hazards for Relapse (Fine-Gray)
#------------------------------------------------------------------------------
# Keep only the first event, re-encode status as factor
<- gbc %>%
gbc_tfe group_by(id) %>%
slice_min(time) %>%
slice_max(status) %>%
mutate(status = factor(status, levels = c("0", "1", "2"))) %>%
ungroup()
<- tidycmprsk::crr(
obj_rel_sub Surv(time, status) ~ hormone + meno + age40 + grade + size + prog + estrg,
data = gbc_tfe,
failcode = "1"
)<- tbl_regression(obj_rel_sub, exponentiate = TRUE)
rel_sub_tbl
# Merge three tables: Death, Relapse (Cause-Specific), Relapse (Subdistribution)
<- tbl_merge(
tbl tbls = list(death_tbl, relc_tbl, rel_sub_tbl),
tab_spanner = c("**Death**", "**Relapse (cause-specific)**", "**Relapse (subdistribution)**")
)
tbl# as_kable_extra(tbl, format = "latex")
#==============================================================================
# (D) Bladder Tumor Study (Recurrent Events with Death as Terminal Event)
#==============================================================================
::install_github("lmaowisc/rccf2")
devtoolslibrary(rccf2)
data(bladder)
#------------------------------------------------------------------------------
# 1. Descriptive Analysis
#------------------------------------------------------------------------------
# Count how many rows belong to each status
%>%
bladder count(status)
# Prepare data for plotting
<- bladder %>%
df left_join(
%>% group_by(id) %>% summarize(max_fu = max(time)),
bladder by = "id"
%>%
) mutate(
id = factor(id),
status = factor(status, levels = c("0", "1", "2")), # 0=censored,1=recur,2=death
trt = if_else(trt == 1, "Thiotepa", "Placebo")
)
<- df %>%
dfmax group_by(id, trt) %>%
summarize(max_fu = max(time), .groups = "drop")
# Follow-up plot showing recurrences/death
%>%
df ggplot(aes(y = reorder(id, max_fu))) +
geom_linerange(data = dfmax, aes(xmin = 0, xmax = max_fu)) +
geom_point(aes(x = time, shape = status), size = 2, fill = "white") +
geom_vline(xintercept = 0, linewidth = 1) +
scale_shape_manual(values = c(23, 15, 19), labels = c("Censoring", "Tumor recurrence", "Death")) +
scale_x_continuous(
"Time (months)",
limits = c(0, 65),
breaks = seq(0, 60, by = 12),
expand = c(0, 0.5)
+
) scale_y_discrete("Patients") +
facet_wrap(~ trt, scales = "free") +
theme_minimal() +
theme(
legend.position = "top",
legend.title = element_blank(),
panel.grid.minor.x = element_blank(),
axis.text.y = element_blank(),
axis.ticks.y = element_blank(),
panel.grid.major.y = element_blank()
)
# ggsave("cmpr_blad_fu.pdf", width = 8, height = 8)
# ggsave("cmpr_blad_fu.eps", width = 8, height = 8)
#------------------------------------------------------------------------------
# 2. Ghosh-Lin Two-Sample Analysis
#------------------------------------------------------------------------------
<- bladder$id
id <- bladder$time
time <- bladder$status
status <- bladder$trt
trt
# Fit the Ghosh-Lin test for recurrent events
<- rccf(id, time, status, trt)
obj_rccf <- obj_rccf$stat # test statistics
stat <- obj_rccf$S # covariance matrix
S <- obj_rccf$u1 # mean function for group 1
u1 <- obj_rccf$u2 # mean function for group 2
u2 <- obj_rccf$t # time points used
t_rccf
# Derive partial tests
<- stat[1] / sqrt(S[1, 1]) # Q-value for recurrence
QLR <- 1 - pchisq(QLR^2, 1)
pLR
<- stat[2] / sqrt(S[2, 2]) # Q-value for death
QD <- 1 - pchisq(QD^2, 1)
pD
<- t(stat) %*% solve(S) %*% stat # joint test
Q <- 1 - pchisq(Q, 2)
p
#------------------------------------------------------------------------------
# 3. Plot Estimated Cumulative Frequency (CF) (Figure 10.x)
#------------------------------------------------------------------------------
par(mfrow = c(1, 2))
# Left panel: Survival from death (0=alive, >0=dead)
<- bladder[status != 1, ]
fatal <- survfit(Surv(time, status > 0) ~ trt, data = fatal)
obj_sf
plot(
obj_sf,lty = c(1, 3),
frame = FALSE,
xlim = c(0, 50),
lwd = 2,
cex.lab = 1.5,
cex.axis = 1.5,
xlab = "Time (months)",
ylab = "Survival probabilities"
)text(35, 0.14, paste0("Log-rank test: p=", round(pD, 3)), cex = 1.2)
# Right panel: Mean functions for tumor recurrences
<- stepfun(t_rccf, c(0, u1))
stepmu1 <- stepfun(t_rccf, c(0, u2))
stepmu2
plot(
stepmu2,do.points= FALSE,
xlab = "Time (months)",
ylab = "Cumulative tumor frequency",
main = "",
xlim = c(0, 60),
ylim = c(0, 3),
frame = FALSE,
lwd = 2,
cex.lab = 1.5,
cex.axis = 1.5,
lty = 3
)plot(
stepmu1,add = TRUE,
lty = 1,
do.points = FALSE,
lwd = 2
)text(40, 0.4, paste0("Ghosh-Lin test: p=", round(pLR, 3)), cex = 1.2)
#------------------------------------------------------------------------------
# 4. Ghosh-Lin Proportional CF Regression (mets::recreg)
#------------------------------------------------------------------------------
library(mets)
data(bladder)
# Create start-stop format for each ID
<- bladder %>%
df_gl group_by(id) %>%
mutate(
start = lag(time, default = 0),
stop = if_else(time == 0, 1e-4, time)
%>%
) select(id, start, stop, status, trt) %>%
ungroup() %>%
as.data.frame()
# Fit the proportional cumulative frequency regression
<- recreg(
obj_reg Event(start, stop, status) ~ trt + cluster(id),
cause = 1, # Recurrent event
death.code = 2, # Terminal event
data = df_gl
)summary(obj_reg)