# Scenario 1: Single regression vector, single baseline hazard
<- coxph(
fit_sf_scenario1 Surv(time, status) ~ x1 + x2 + frailty(id, distribution = "gamma"),
data = df
)# Scenario 2: One covariate effect vector, separate baseline hazards
<- coxph(
fit_sf_scenario2 Surv(time, status) ~ x1 + x2 + strata(enum)
+ frailty(id, distribution = "gamma",
data = df
)# Scenario 3: Fully distinct covariate vectors and baseline hazards
<- coxph(
fit_sf_scenario3 Surv(time, status) ~ (x1 + x2) * strata(enum)
+ frailty(id, distribution = "gamma",
data = df
)
Chapter 8 - Multivariate Failure Times
Slides
Lecture slides here. (To convert html to pdf, press E \(\to\) Print \(\to\) Destination: Save to pdf)
Chapter Summary
When analyzing multiple correlated event times—arising from either multiple failure types per subject (e.g., different clinical outcomes within the same patient) or clusters of subjects (e.g., families, institutions)—standard univariate approaches must be adapted to account for within-unit dependence. To do so, two broad modeling frameworks are useful:
- Shared-frailty (conditional) models, which introduce a unit-level random effect (frailty) that influences all failure times in that unit, and
- Marginal models, which specify each event’s hazard at the population level and account for correlation via robust (sandwich) variance estimation.
Both frameworks extend Cox-type survival methods to multivariate data but differ in interpretation (conditional vs. population-averaged effects), assumptions about the joint distribution, and how correlation parameters are handled.
Marginal models
A marginal Cox model specifies each event’s hazard averaged over the distribution of unobserved frailties or random effects:
\[ \Lambda_k(t \mid Z) \;=\; \exp\bigl(\beta_k^\mathsf{T}Z\bigr)\,\Lambda_{k0}(t). \]
The within-unit dependence is handled by a robust (sandwich) variance estimator, rather than an explicit random-effects structure.
- Interpretation: \(\beta_k\) gives a population-level hazard ratio, in contrast to the conditional hazard ratio from shared-frailty models.
- Estimation: The partial-likelihood score is constructed as if events were independent, but the covariance of \(\hat\beta\) is corrected using a cluster-based sandwich estimator (e.g.,
cluster(id)
in R).
Three scenarios
When multiple event types exist (e.g., heart failure vs. stroke) or multiple members in a cluster, one can combine or separate the baselines and covariate effects:
- Same \(\beta\) and same baseline: All events share one regression vector and one baseline hazard.
- Same \(\beta\) but different baselines: Impose a single set of covariate effects while allowing event types (or clusters) different baseline hazards.
- Different \(\beta\) and different baselines: Each event type (or group) has its own regression parameter vector and baseline hazard.
These choices apply to both shared-frailty and marginal modeling frameworks, depending on scientific questions and study design.
Example R commands
Below are illustration snippets in R using the survival
package, demonstrating how to fit both shared-frailty (conditional) and marginal Cox models under the three scenario types. We assume a data frame df
with columns:
time
(event or censor time),
status
(0/1 indicator),
id
(cluster or subject identifier),
enum
(if multiple event types),
- covariates (e.g.,
x1
,x2
).
Marginal (population-averaged) models
# Scenario 1: Single beta, single baseline across events
<- coxph(
fit_marg_scenario1 Surv(time, status) ~ x1 + x2 + cluster(id),
data = df
)# Scenario 2: Shared covariate effects, separate baseline hazards
<- coxph(
fit_marg_scenario2 Surv(time, status) ~ x1 + x2 + strata(enum) + cluster(id),
data = df
)# Scenario 3: Event-type-specific beta vectors and baselines
<- coxph(
fit_marg_scenario3 Surv(time, status) ~ (x1 + x2) * strata(enum) + cluster(id),
data = df
)
Conclusion
Modeling multiple correlated time-to-event outcomes requires specialized approaches:
Shared-frailty (conditional) models:
- Introduce a random effect \(\xi\) common to all events in a cluster.
- Covariate effects are interpreted within-unit.
- Provide direct correlation estimates via frailty variance but need stronger assumptions and more computation.
Marginal models:
- Focus on population-level hazard functions.
- Covariate effects are population-averaged.
- Dependence is handled by robust (sandwich) variances, requiring fewer assumptions but not yielding explicit correlation measures.
The choice between these approaches depends on whether unit-specific or population-averaged interpretations are of interest, along with practical considerations of model complexity and assumptions.
R Code
Show the code
###############################################################################
# Chapter 8 R Code
#
# This script reproduces all major numerical results in Chapter 8, including:
# 1. Clustered data analysis (NCCTG lung cancer study)
# 2. Bivariate marginal Cox model (Diabetic Retinopathy Study)
# 3. Multistate endpoints (TOPCAT trial)
###############################################################################
library(survival)
library(tidyverse)
library(patchwork)
#==============================================================================
# (A) NCCTG Lung Cancer Study (Clustered Data)
#==============================================================================
#------------------------------------------------------------------------------
# 1. Read the NCCTG lung cancer data
# (clustered by institution)
#------------------------------------------------------------------------------
<- read.table("Data//NCCTG//lung.txt")
df head(df)
#------------------------------------------------------------------------------
# 2. Plot Follow-Up by Institution and Sex
#------------------------------------------------------------------------------
<- function(df) {
inst_by_sex_fu_plot %>%
df ggplot(aes(y = reorder(id, time), x = time, color = factor(2 - sex))) +
geom_linerange(aes(xmin = 0, xmax = time)) +
geom_point(aes(shape = factor(status)), size = 2, fill = "white") +
geom_vline(xintercept = 0, linewidth = 1) +
facet_grid(inst ~ ., scales = "free", space = "free", switch = "y") +
theme_minimal() +
scale_x_continuous(
"Time (months)",
limits = c(0, 36),
breaks = seq(0, 36, by = 12),
expand = c(0, 0.25)
+
) scale_y_discrete("Patients (by institution)") +
scale_shape_manual(
values = c(23, 19),
labels = c("Censoring", "Death")
+
) scale_color_brewer(
palette = "Set1",
labels = c("Female", "Male")
+
) theme(
strip.background = element_rect(fill = "gray90", color = "gray90"),
axis.text.y = element_blank(),
axis.ticks.y = element_blank(),
axis.line.y = element_blank(),
panel.grid.major.y = element_blank(),
legend.title = element_blank()
)
}
<- inst_by_sex_fu_plot(df |> filter(inst <= 11))
p1 <- inst_by_sex_fu_plot(df |> filter(inst > 11))
p2
<- p1 + p2 +
mul_lung_fu plot_layout(ncol = 2, guides = "collect") &
theme(legend.position = "top")
# ggsave("mul_lung_fu.pdf", mul_lung_fu, width = 8, height = 10)
# ggsave("mul_lung_fu.eps", mul_lung_fu, width = 8, height = 10)
#------------------------------------------------------------------------------
# 3. Cox Model with Institution-Specific Frailty
#------------------------------------------------------------------------------
$sex <- factor(df$sex)
df
<- coxph(
obj Surv(time, status) ~ age + sex + phec + phkn + ptkn + wl +
frailty(inst, distribution = "gamma"),
data = df
)# Alternatively:
# obj <- coxph(
# Surv(time, status) ~ age + sex + phec + phkn + ptkn + wl +
# frailty(inst, distribution = "gaussian"),
# data = df
# )
summary(obj) # model summary
$coefficients # extract beta
obj$frail # extract log-frailty
obj
#------------------------------------------------------------------------------
# 4. Naive Cox Model (No Frailty)
#------------------------------------------------------------------------------
<- coxph(
obj_naive Surv(time, status) ~ age + sex + phec + phkn + ptkn + wl,
data = df
)summary(obj_naive)
#------------------------------------------------------------------------------
# 5. Subject-Specific Survival Curves (Figure 8.2)
#------------------------------------------------------------------------------
# Typical patient with median values
<- median(df$age, na.rm = TRUE)
med_age <- median(df$phkn, na.rm = TRUE)
med_phkn <- median(df$ptkn, na.rm = TRUE)
med_ptkn <- median(df$wl, na.rm = TRUE)
med_wl
<- obj$coefficients
beta <- basehaz(obj, centered = FALSE)
base_obj <- base_obj$hazard
eta <- base_obj$time
t
# Covariate profiles
# ECOG: 0, 1, 2, 3
<- c(med_age, 1, 0, med_phkn, med_ptkn, med_wl)
zf0 <- c(med_age, 1, 1, med_phkn, med_ptkn, med_wl)
zf1 <- c(med_age, 1, 2, med_phkn, med_ptkn, med_wl)
zf2 <- c(med_age, 1, 3, med_phkn, med_ptkn, med_wl)
zf3
<- c(med_age, 0, 0, med_phkn, med_ptkn, med_wl)
zm0 <- c(med_age, 0, 1, med_phkn, med_ptkn, med_wl)
zm1 <- c(med_age, 0, 2, med_phkn, med_ptkn, med_wl)
zm2 <- c(med_age, 0, 3, med_phkn, med_ptkn, med_wl)
zm3
# Base R plots for female
par(mfrow = c(1, 2))
plot(
t,exp(-exp(sum(beta * zf0)) * eta),
type = "s",
xlim = c(0, 35),
ylim = c(0, 1),
frame = FALSE,
lty = 1,
main = "Female",
xlab = "Time (months)",
ylab = "Survival probabilities",
lwd = 2,
cex.lab = 1.3,
cex.axis= 1.3,
cex.main= 1.3
)lines(t, exp(-exp(sum(beta * zf1)) * eta), lty = 2, lwd = 2)
lines(t, exp(-exp(sum(beta * zf2)) * eta), lty = 3, lwd = 2)
lines(t, exp(-exp(sum(beta * zf3)) * eta), lty = 4, lwd = 2)
legend("topright", lty = 1:4, lwd = 2, cex = 1.2, paste("ECOG", 0:3))
# Base R plots for male
plot(
t,exp(-exp(sum(beta * zm0)) * eta),
type = "s",
xlim = c(0, 35),
ylim = c(0, 1),
frame = FALSE,
lty = 1,
main = "Male",
xlab = "Time (months)",
ylab = "Survival probabilities",
lwd = 2,
cex.lab = 1.3,
cex.axis= 1.3,
cex.main= 1.3
)lines(t, exp(-exp(sum(beta * zm1)) * eta), lty = 2, lwd = 2)
lines(t, exp(-exp(sum(beta * zm2)) * eta), lty = 3, lwd = 2)
lines(t, exp(-exp(sum(beta * zm3)) * eta), lty = 4, lwd = 2)
legend("topright", lty = 1:4, lwd = 2, cex = 1.2, paste("ECOG", 0:3))
#==============================================================================
# (B) Diabetic Retinopathy Study (Bivariate Marginal Cox Model)
#==============================================================================
#------------------------------------------------------------------------------
# 1. Read the Data
#------------------------------------------------------------------------------
<- read.table("Data//Diabetic Retinopathy Study//drs.txt")
df_drs head(df_drs)
#------------------------------------------------------------------------------
# 2. Fit Bivariate Marginal Cox Model
#------------------------------------------------------------------------------
<- coxph(
obj_drs Surv(time, status) ~ trt + type + trt:type + risk + cluster(id),
data = df_drs
)summary(obj_drs)
#------------------------------------------------------------------------------
# 3. Table 8.1: Marginal Cox Model Analysis (Estimates, SE, p-values)
#------------------------------------------------------------------------------
<- summary(obj_drs)$coefficients
coeff <- coeff[, 1] # beta estimate
c1 <- coeff[, 4] # robust SE
c2 <- coeff[, 6] # robust p-value
c3 <- coeff[, 3] # naive SE
c4 <- 1 - pchisq((c1 / c4)^2, 1) # naive p-value
c5
# Output table
noquote(round(cbind(c1, c2, c3, c4, c5), 3))
#------------------------------------------------------------------------------
# 4. Prediction of Vision-Retention Probabilities (Figure 8.4)
#------------------------------------------------------------------------------
<- basehaz(obj_drs, centered = FALSE)
Lt <- Lt$time
t <- Lt$hazard
L <- coeff[, 1]
beta
# Covariate profiles: adult vs. juvenile, control vs. treatment
<- exp(-exp(sum(beta * c(0, 0, 10, 0))) * L)
adult_contr <- exp(-exp(sum(beta * c(1, 0, 10, 0))) * L)
adult_trt <- exp(-exp(sum(beta * c(0, 1, 10, 0))) * L)
juv_contr <- exp(-exp(sum(beta * c(1, 1, 10, 1))) * L)
juv_trt
par(mfrow = c(1, 2))
# Adult
plot(
t, adult_contr,type = "s",
xlim = c(0, 80),
ylim = c(0, 1),
frame.plot = FALSE,
lty = 3,
main = "Adult",
xlab = "Time (months)",
ylab = "Vision-retention probabilities",
lwd = 2,
cex.lab = 1.2,
cex.axis= 1.2,
cex.main= 1.2
)lines(t, adult_trt, lty = 1, lwd = 2)
# Juvenile
plot(
t, juv_contr,type = "s",
xlim = c(0, 80),
ylim = c(0, 1),
frame = FALSE,
lty = 3,
main = "Juvenile",
xlab = "Time (months)",
ylab = "Vision-retention probabilities",
lwd = 2,
cex.lab = 1.2,
cex.axis= 1.2,
cex.main= 1.2
)lines(t, juv_trt, lty = 1, lwd = 2)
#==============================================================================
# (C) TOPCAT Trial (Multiple Endpoints)
#==============================================================================
#------------------------------------------------------------------------------
# 1. Read and Inspect the Data
#------------------------------------------------------------------------------
<- read.table("Data//TOPCAT//topcat.txt")
topcat
# Median follow-up
%>%
topcat group_by(id) %>%
slice_max(time) %>%
slice_head() %>%
ungroup() %>%
summarize(median(time))
#------------------------------------------------------------------------------
# 2. Descriptive Analysis
#------------------------------------------------------------------------------
<- topcat %>%
tmp mutate(
drug = if_else(drug == "Spiro", "Spironolactone", "Placebo"),
gender = if_else(gender == "1:Male", "Male", "Female")
)
# De-duplicate
<- tmp %>%
df_topcat pivot_wider(
id_cols = id,
names_from = endpoint,
values_from = c(time, status)
%>%
) left_join(
%>% filter(endpoint == "HF"),
tmp join_by(id)
)
# Helper: median (IQR) function
<- function(x, r = 1) {
med_iqr <- quantile(x, na.rm = TRUE)
qt paste0(
round(qt[3], r), " (",
round(qt[2], r), ", ",
round(qt[4], r), ")"
)
}
# Quantitative variables table
<- df_topcat %>%
tab_quant filter(endpoint == "HF") %>%
group_by(drug) %>%
summarize(across(c(age, bmi, hr), med_iqr)) %>%
pivot_longer(!drug, values_to = "value", names_to = "name") %>%
pivot_wider(values_from = value, names_from = drug) %>%
mutate(
name = case_when(
== "age" ~ "Age (years)",
name == "bmi" ~ "BMI (kg/m^2)",
name == "hr" ~ "Heart rate (per min)"
name
)
)
# Helper function for frequency tables
<- function(df, group, var, r = 1) {
freq_pct <- df %>%
var_counts group_by({{ group }}, {{ var }}) %>%
summarize(n = n(), .groups = "drop")
%>%
var_counts left_join(
%>% group_by({{ group }}) %>% summarize(N = sum(n)),
var_counts by = join_by({{ group }})
%>%
) mutate(
value = paste0(n, " (", round(100 * n / N, r), "%)")
%>%
) select(-c(n, N)) %>%
pivot_wider(names_from = {{ group }}, values_from = value) %>%
rename(name = {{ var }})
}
# Categorical variables
<- df_topcat %>%
gender freq_pct(drug, gender) %>%
mutate(name = paste0("Gender - ", name))
<- df_topcat %>%
race freq_pct(drug, race) %>%
mutate(name = paste0("Race - ", name))
<- df_topcat %>%
nyha freq_pct(drug, nyha) %>%
mutate(name = paste0("NYHA - ", name)) %>%
filter(!is.na(name))
# Helper for binary condition
<- function(condition, r = 1) {
bin_pct <- sum(condition, na.rm = TRUE)
n <- length(condition)
N paste0(n, " (", round(100 * n / N, r), "%)")
}
<- df_topcat %>%
tabin group_by(drug) %>%
summarize(
N = n(),
across(c(smoke:cabg, status_HF, status_MI, status_Stroke), bin_pct)
%>%
) select(!N) %>%
pivot_longer(!drug, values_to = "value", names_to = "name") %>%
pivot_wider(values_from = value, names_from = drug) %>%
mutate(
name = case_when(
== "smoke" ~ "Smoker",
name == "chf_hosp" ~ "CHF",
name == "copd" ~ "COPD",
name == "asthma" ~ "Asthma",
name == "dm" ~ "Diabetes",
name == "htn" ~ "Hypertension",
name == "cabg" ~ "Coronary surgery",
name == "status_HF" ~ "HF",
name == "status_MI" ~ "MI",
name == "status_Stroke" ~ "Stroke"
name
)
)
# Event rates (per person-year)
<- df_topcat %>%
event_rates group_by(drug) %>%
summarize(
`HF rate (per person-year)` = sum(status_HF) / sum(time_HF),
`MI rate (per person-year)` = sum(status_MI) / sum(time_MI),
`Stroke rate (per person-year)` = sum(status_Stroke) / sum(time_Stroke)
%>%
) pivot_longer(!drug, values_to = "value", names_to = "name") %>%
pivot_wider(values_from = value, names_from = drug) %>%
mutate(
Placebo = as.character(round(Placebo, 4)),
Spironolactone = as.character(round(Spironolactone, 4))
)
# Combine tables
<- bind_rows(
tabone 1, ],
tab_quant[
gender,
race,
nyha,-1, ],
tab_quant[
tabin,
event_rates
)
# Add N to group names
colnames(tabone) <- c(
" ",
paste0(colnames(tabone)[2:3], " (N=", table(df_topcat$drug), ")")
)
# Print table (example with knitr::kable)
::kable(tabone)
knitr
#------------------------------------------------------------------------------
# 3. Cox Model for Multiple Endpoints (Scenario 3)
#------------------------------------------------------------------------------
<- coxph(
obj_topcat Surv(time, status) ~
+ age + gender + nyha + bmi + smoke + chf_hosp +
(drug + asthma + dm + htn + cabg) *
copd strata(endpoint) + cluster(id),
data = topcat
)summary(obj_topcat)
$coefficients
obj_topcat
#------------------------------------------------------------------------------
# 4. Component-Specific HRs, 95% CIs, p-values
#------------------------------------------------------------------------------
<- 12 # number of covariates
p <- obj_topcat$coefficients
beta <- obj_topcat$var
Sigma
<- numeric(p)
betaHF <- numeric(p)
varHF <- numeric(p)
betaMI <- numeric(p)
varMI <- numeric(p)
betaStr <- numeric(p)
varStr
# Fill in HF, MI, Stroke
for (j in seq_len(p)) {
# HF
<- beta[j]
betaHF[j] <- Sigma[j, j]
varHF[j]
# MI (add interaction terms)
<- beta[j] + beta[p + 2 * j - 1]
betaMI[j] <- Sigma[j, j] +
varMI[j] + 2 * j - 1, p + 2 * j - 1] +
Sigma[p 2 * Sigma[j, p + 2 * j - 1]
# Stroke
<- beta[j] + beta[p + 2 * j]
betaStr[j] <- Sigma[j, j] +
varStr[j] + 2 * j, p + 2 * j] +
Sigma[p 2 * Sigma[j, p + 2 * j]
}
<- qnorm(0.975)
za
<- function(b, v, r = 2) {
hr_ci <- sqrt(v)
se paste0(
round(exp(b), r), " (",
round(exp(b - za * se), r), ", ",
round(exp(b + za * se), r), ")"
)
}
<- function(b, v) {
hr_p <- sqrt(v)
se round(1 - pchisq((b / se)^2, df = 1), 3)
}
<- tibble(
df_res variable = c(
"Spironolactone", "Age (years)", "Female", "NYHA 3-4", "BMI",
"Smoke", "CHF", "COPD", "Asthma", "Diabetes", "Hypertension", "Coronary surgery"
),HF = hr_ci(betaHF, varHF),
HF_p = hr_p(betaHF, varHF),
MI = hr_ci(betaMI, varMI),
MI_p = hr_p(betaMI, varMI),
Stroke = hr_ci(betaStr, varStr),
Stroke_p = hr_p(betaStr, varStr)
)
# Print out table
df_res
#------------------------------------------------------------------------------
# 5. Joint Test of Spironolactone Effect on HF & Stroke
#------------------------------------------------------------------------------
<- beta[c(1, 14)]
beta_q <- Sigma[c(1, 14), c(1, 14)]
Sigma_q
<- t(beta_q) %*% solve(Sigma_q) %*% beta_q
chisq2 <- 1 - pchisq(chisq2, 2)
pval
# pval is the p-value for the joint test