# Scenario 1: Single regression vector, single baseline hazard
fit_sf_scenario1 <- coxph(
Surv(time, status) ~ x1 + x2 + frailty(id, distribution = "gamma"),
data = df
)
# Scenario 2: One covariate effect vector, separate baseline hazards
fit_sf_scenario2 <- coxph(
Surv(time, status) ~ x1 + x2 + strata(enum)
+ frailty(id, distribution = "gamma",
data = df
)
# Scenario 3: Fully distinct covariate vectors and baseline hazards
fit_sf_scenario3 <- coxph(
Surv(time, status) ~ (x1 + x2) * strata(enum)
+ frailty(id, distribution = "gamma",
data = df
)Chapter 8 - Multivariate Failure Times
Slides
Lecture slides here. (To convert html to pdf, press E \(\to\) Print \(\to\) Destination: Save to pdf)
Chapter Summary
When analyzing multiple correlated event times—arising from either multiple failure types per subject (e.g., different clinical outcomes within the same patient) or clusters of subjects (e.g., families, institutions)—standard univariate approaches must be adapted to account for within-unit dependence. To do so, two broad modeling frameworks are useful:
- Shared-frailty (conditional) models, which introduce a unit-level random effect (frailty) that influences all failure times in that unit, and
- Marginal models, which specify each event’s hazard at the population level and account for correlation via robust (sandwich) variance estimation.
Both frameworks extend Cox-type survival methods to multivariate data but differ in interpretation (conditional vs. population-averaged effects), assumptions about the joint distribution, and how correlation parameters are handled.
Marginal models
A marginal Cox model specifies each event’s hazard averaged over the distribution of unobserved frailties or random effects:
\[ \Lambda_k(t \mid Z) \;=\; \exp\bigl(\beta_k^\mathsf{T}Z\bigr)\,\Lambda_{k0}(t). \]
The within-unit dependence is handled by a robust (sandwich) variance estimator, rather than an explicit random-effects structure.
- Interpretation: \(\beta_k\) gives a population-level hazard ratio, in contrast to the conditional hazard ratio from shared-frailty models.
- Estimation: The partial-likelihood score is constructed as if events were independent, but the covariance of \(\hat\beta\) is corrected using a cluster-based sandwich estimator (e.g.,
cluster(id)in R).
Three scenarios
When multiple event types exist (e.g., heart failure vs. stroke) or multiple members in a cluster, one can combine or separate the baselines and covariate effects:
- Same \(\beta\) and same baseline: All events share one regression vector and one baseline hazard.
- Same \(\beta\) but different baselines: Impose a single set of covariate effects while allowing event types (or clusters) different baseline hazards.
- Different \(\beta\) and different baselines: Each event type (or group) has its own regression parameter vector and baseline hazard.
These choices apply to both shared-frailty and marginal modeling frameworks, depending on scientific questions and study design.
Example R commands
Below are illustration snippets in R using the survival package, demonstrating how to fit both shared-frailty (conditional) and marginal Cox models under the three scenario types. We assume a data frame df with columns:
time(event or censor time),
status(0/1 indicator),
id(cluster or subject identifier),
enum(if multiple event types),
- covariates (e.g.,
x1,x2).
Marginal (population-averaged) models
# Scenario 1: Single beta, single baseline across events
fit_marg_scenario1 <- coxph(
Surv(time, status) ~ x1 + x2 + cluster(id),
data = df
)
# Scenario 2: Shared covariate effects, separate baseline hazards
fit_marg_scenario2 <- coxph(
Surv(time, status) ~ x1 + x2 + strata(enum) + cluster(id),
data = df
)
# Scenario 3: Event-type-specific beta vectors and baselines
fit_marg_scenario3 <- coxph(
Surv(time, status) ~ (x1 + x2) * strata(enum) + cluster(id),
data = df
)Conclusion
Modeling multiple correlated time-to-event outcomes requires specialized approaches:
Shared-frailty (conditional) models:
- Introduce a random effect \(\xi\) common to all events in a cluster.
- Covariate effects are interpreted within-unit.
- Provide direct correlation estimates via frailty variance but need stronger assumptions and more computation.
Marginal models:
- Focus on population-level hazard functions.
- Covariate effects are population-averaged.
- Dependence is handled by robust (sandwich) variances, requiring fewer assumptions but not yielding explicit correlation measures.
The choice between these approaches depends on whether unit-specific or population-averaged interpretations are of interest, along with practical considerations of model complexity and assumptions.
R Code
Show the code
###############################################################################
# Chapter 8 R Code
#
# This script reproduces all major numerical results in Chapter 8, including:
# 1. Clustered data analysis (NCCTG lung cancer study)
# 2. Bivariate marginal Cox model (Diabetic Retinopathy Study)
# 3. Multistate endpoints (TOPCAT trial)
###############################################################################
library(survival)
library(tidyverse)
library(patchwork)
#==============================================================================
# (A) NCCTG Lung Cancer Study (Clustered Data)
#==============================================================================
#------------------------------------------------------------------------------
# 1. Read the NCCTG lung cancer data
# (clustered by institution)
#------------------------------------------------------------------------------
df <- read.table("Data//NCCTG//lung.txt")
head(df)
#------------------------------------------------------------------------------
# 2. Plot Follow-Up by Institution and Sex
#------------------------------------------------------------------------------
inst_by_sex_fu_plot <- function(df) {
df %>%
ggplot(aes(y = reorder(id, time), x = time, color = factor(2 - sex))) +
geom_linerange(aes(xmin = 0, xmax = time)) +
geom_point(aes(shape = factor(status)), size = 2, fill = "white") +
geom_vline(xintercept = 0, linewidth = 1) +
facet_grid(inst ~ ., scales = "free", space = "free", switch = "y") +
theme_minimal() +
scale_x_continuous(
"Time (months)",
limits = c(0, 36),
breaks = seq(0, 36, by = 12),
expand = c(0, 0.25)
) +
scale_y_discrete("Patients (by institution)") +
scale_shape_manual(
values = c(23, 19),
labels = c("Censoring", "Death")
) +
scale_color_brewer(
palette = "Set1",
labels = c("Female", "Male")
) +
theme(
strip.background = element_rect(fill = "gray90", color = "gray90"),
axis.text.y = element_blank(),
axis.ticks.y = element_blank(),
axis.line.y = element_blank(),
panel.grid.major.y = element_blank(),
legend.title = element_blank()
)
}
p1 <- inst_by_sex_fu_plot(df |> filter(inst <= 11))
p2 <- inst_by_sex_fu_plot(df |> filter(inst > 11))
mul_lung_fu <- p1 + p2 +
plot_layout(ncol = 2, guides = "collect") &
theme(legend.position = "top")
# ggsave("mul_lung_fu.pdf", mul_lung_fu, width = 8, height = 10)
# ggsave("mul_lung_fu.eps", mul_lung_fu, width = 8, height = 10)
#------------------------------------------------------------------------------
# 3. Cox Model with Institution-Specific Frailty
#------------------------------------------------------------------------------
df$sex <- factor(df$sex)
obj <- coxph(
Surv(time, status) ~ age + sex + phec + phkn + ptkn + wl +
frailty(inst, distribution = "gamma"),
data = df
)
# Alternatively:
# obj <- coxph(
# Surv(time, status) ~ age + sex + phec + phkn + ptkn + wl +
# frailty(inst, distribution = "gaussian"),
# data = df
# )
summary(obj) # model summary
obj$coefficients # extract beta
obj$frail # extract log-frailty
#------------------------------------------------------------------------------
# 4. Naive Cox Model (No Frailty)
#------------------------------------------------------------------------------
obj_naive <- coxph(
Surv(time, status) ~ age + sex + phec + phkn + ptkn + wl,
data = df
)
summary(obj_naive)
#------------------------------------------------------------------------------
# 5. Subject-Specific Survival Curves (Figure 8.2)
#------------------------------------------------------------------------------
# Typical patient with median values
med_age <- median(df$age, na.rm = TRUE)
med_phkn <- median(df$phkn, na.rm = TRUE)
med_ptkn <- median(df$ptkn, na.rm = TRUE)
med_wl <- median(df$wl, na.rm = TRUE)
beta <- obj$coefficients
base_obj <- basehaz(obj, centered = FALSE)
eta <- base_obj$hazard
t <- base_obj$time
# Covariate profiles
# ECOG: 0, 1, 2, 3
zf0 <- c(med_age, 1, 0, med_phkn, med_ptkn, med_wl)
zf1 <- c(med_age, 1, 1, med_phkn, med_ptkn, med_wl)
zf2 <- c(med_age, 1, 2, med_phkn, med_ptkn, med_wl)
zf3 <- c(med_age, 1, 3, med_phkn, med_ptkn, med_wl)
zm0 <- c(med_age, 0, 0, med_phkn, med_ptkn, med_wl)
zm1 <- c(med_age, 0, 1, med_phkn, med_ptkn, med_wl)
zm2 <- c(med_age, 0, 2, med_phkn, med_ptkn, med_wl)
zm3 <- c(med_age, 0, 3, med_phkn, med_ptkn, med_wl)
# Base R plots for female
par(mfrow = c(1, 2))
plot(
t,
exp(-exp(sum(beta * zf0)) * eta),
type = "s",
xlim = c(0, 35),
ylim = c(0, 1),
frame = FALSE,
lty = 1,
main = "Female",
xlab = "Time (months)",
ylab = "Survival probabilities",
lwd = 2,
cex.lab = 1.3,
cex.axis= 1.3,
cex.main= 1.3
)
lines(t, exp(-exp(sum(beta * zf1)) * eta), lty = 2, lwd = 2)
lines(t, exp(-exp(sum(beta * zf2)) * eta), lty = 3, lwd = 2)
lines(t, exp(-exp(sum(beta * zf3)) * eta), lty = 4, lwd = 2)
legend("topright", lty = 1:4, lwd = 2, cex = 1.2, paste("ECOG", 0:3))
# Base R plots for male
plot(
t,
exp(-exp(sum(beta * zm0)) * eta),
type = "s",
xlim = c(0, 35),
ylim = c(0, 1),
frame = FALSE,
lty = 1,
main = "Male",
xlab = "Time (months)",
ylab = "Survival probabilities",
lwd = 2,
cex.lab = 1.3,
cex.axis= 1.3,
cex.main= 1.3
)
lines(t, exp(-exp(sum(beta * zm1)) * eta), lty = 2, lwd = 2)
lines(t, exp(-exp(sum(beta * zm2)) * eta), lty = 3, lwd = 2)
lines(t, exp(-exp(sum(beta * zm3)) * eta), lty = 4, lwd = 2)
legend("topright", lty = 1:4, lwd = 2, cex = 1.2, paste("ECOG", 0:3))
#==============================================================================
# (B) Diabetic Retinopathy Study (Bivariate Marginal Cox Model)
#==============================================================================
#------------------------------------------------------------------------------
# 1. Read the Data
#------------------------------------------------------------------------------
df_drs <- read.table("Data//Diabetic Retinopathy Study//drs.txt")
head(df_drs)
#------------------------------------------------------------------------------
# 2. Fit Bivariate Marginal Cox Model
#------------------------------------------------------------------------------
obj_drs <- coxph(
Surv(time, status) ~ trt + type + trt:type + risk + cluster(id),
data = df_drs
)
summary(obj_drs)
#------------------------------------------------------------------------------
# 3. Table 8.1: Marginal Cox Model Analysis (Estimates, SE, p-values)
#------------------------------------------------------------------------------
coeff <- summary(obj_drs)$coefficients
c1 <- coeff[, 1] # beta estimate
c2 <- coeff[, 4] # robust SE
c3 <- coeff[, 6] # robust p-value
c4 <- coeff[, 3] # naive SE
c5 <- 1 - pchisq((c1 / c4)^2, 1) # naive p-value
# Output table
noquote(round(cbind(c1, c2, c3, c4, c5), 3))
#------------------------------------------------------------------------------
# 4. Prediction of Vision-Retention Probabilities (Figure 8.4)
#------------------------------------------------------------------------------
Lt <- basehaz(obj_drs, centered = FALSE)
t <- Lt$time
L <- Lt$hazard
beta <- coeff[, 1]
# Covariate profiles: adult vs. juvenile, control vs. treatment
adult_contr <- exp(-exp(sum(beta * c(0, 0, 10, 0))) * L)
adult_trt <- exp(-exp(sum(beta * c(1, 0, 10, 0))) * L)
juv_contr <- exp(-exp(sum(beta * c(0, 1, 10, 0))) * L)
juv_trt <- exp(-exp(sum(beta * c(1, 1, 10, 1))) * L)
par(mfrow = c(1, 2))
# Adult
plot(
t, adult_contr,
type = "s",
xlim = c(0, 80),
ylim = c(0, 1),
frame.plot = FALSE,
lty = 3,
main = "Adult",
xlab = "Time (months)",
ylab = "Vision-retention probabilities",
lwd = 2,
cex.lab = 1.2,
cex.axis= 1.2,
cex.main= 1.2
)
lines(t, adult_trt, lty = 1, lwd = 2)
# Juvenile
plot(
t, juv_contr,
type = "s",
xlim = c(0, 80),
ylim = c(0, 1),
frame = FALSE,
lty = 3,
main = "Juvenile",
xlab = "Time (months)",
ylab = "Vision-retention probabilities",
lwd = 2,
cex.lab = 1.2,
cex.axis= 1.2,
cex.main= 1.2
)
lines(t, juv_trt, lty = 1, lwd = 2)
#==============================================================================
# (C) TOPCAT Trial (Multiple Endpoints)
#==============================================================================
#------------------------------------------------------------------------------
# 1. Read and Inspect the Data
#------------------------------------------------------------------------------
topcat <- read.table("Data//TOPCAT//topcat.txt")
# Median follow-up
topcat %>%
group_by(id) %>%
slice_max(time) %>%
slice_head() %>%
ungroup() %>%
summarize(median(time))
#------------------------------------------------------------------------------
# 2. Descriptive Analysis
#------------------------------------------------------------------------------
tmp <- topcat %>%
mutate(
drug = if_else(drug == "Spiro", "Spironolactone", "Placebo"),
gender = if_else(gender == "1:Male", "Male", "Female")
)
# De-duplicate
df_topcat <- tmp %>%
pivot_wider(
id_cols = id,
names_from = endpoint,
values_from = c(time, status)
) %>%
left_join(
tmp %>% filter(endpoint == "HF"),
join_by(id)
)
# Helper: median (IQR) function
med_iqr <- function(x, r = 1) {
qt <- quantile(x, na.rm = TRUE)
paste0(
round(qt[3], r), " (",
round(qt[2], r), ", ",
round(qt[4], r), ")"
)
}
# Quantitative variables table
tab_quant <- df_topcat %>%
filter(endpoint == "HF") %>%
group_by(drug) %>%
summarize(across(c(age, bmi, hr), med_iqr)) %>%
pivot_longer(!drug, values_to = "value", names_to = "name") %>%
pivot_wider(values_from = value, names_from = drug) %>%
mutate(
name = case_when(
name == "age" ~ "Age (years)",
name == "bmi" ~ "BMI (kg/m^2)",
name == "hr" ~ "Heart rate (per min)"
)
)
# Helper function for frequency tables
freq_pct <- function(df, group, var, r = 1) {
var_counts <- df %>%
group_by({{ group }}, {{ var }}) %>%
summarize(n = n(), .groups = "drop")
var_counts %>%
left_join(
var_counts %>% group_by({{ group }}) %>% summarize(N = sum(n)),
by = join_by({{ group }})
) %>%
mutate(
value = paste0(n, " (", round(100 * n / N, r), "%)")
) %>%
select(-c(n, N)) %>%
pivot_wider(names_from = {{ group }}, values_from = value) %>%
rename(name = {{ var }})
}
# Categorical variables
gender <- df_topcat %>%
freq_pct(drug, gender) %>%
mutate(name = paste0("Gender - ", name))
race <- df_topcat %>%
freq_pct(drug, race) %>%
mutate(name = paste0("Race - ", name))
nyha <- df_topcat %>%
freq_pct(drug, nyha) %>%
mutate(name = paste0("NYHA - ", name)) %>%
filter(!is.na(name))
# Helper for binary condition
bin_pct <- function(condition, r = 1) {
n <- sum(condition, na.rm = TRUE)
N <- length(condition)
paste0(n, " (", round(100 * n / N, r), "%)")
}
tabin <- df_topcat %>%
group_by(drug) %>%
summarize(
N = n(),
across(c(smoke:cabg, status_HF, status_MI, status_Stroke), bin_pct)
) %>%
select(!N) %>%
pivot_longer(!drug, values_to = "value", names_to = "name") %>%
pivot_wider(values_from = value, names_from = drug) %>%
mutate(
name = case_when(
name == "smoke" ~ "Smoker",
name == "chf_hosp" ~ "CHF",
name == "copd" ~ "COPD",
name == "asthma" ~ "Asthma",
name == "dm" ~ "Diabetes",
name == "htn" ~ "Hypertension",
name == "cabg" ~ "Coronary surgery",
name == "status_HF" ~ "HF",
name == "status_MI" ~ "MI",
name == "status_Stroke" ~ "Stroke"
)
)
# Event rates (per person-year)
event_rates <- df_topcat %>%
group_by(drug) %>%
summarize(
`HF rate (per person-year)` = sum(status_HF) / sum(time_HF),
`MI rate (per person-year)` = sum(status_MI) / sum(time_MI),
`Stroke rate (per person-year)` = sum(status_Stroke) / sum(time_Stroke)
) %>%
pivot_longer(!drug, values_to = "value", names_to = "name") %>%
pivot_wider(values_from = value, names_from = drug) %>%
mutate(
Placebo = as.character(round(Placebo, 4)),
Spironolactone = as.character(round(Spironolactone, 4))
)
# Combine tables
tabone <- bind_rows(
tab_quant[1, ],
gender,
race,
nyha,
tab_quant[-1, ],
tabin,
event_rates
)
# Add N to group names
colnames(tabone) <- c(
" ",
paste0(colnames(tabone)[2:3], " (N=", table(df_topcat$drug), ")")
)
# Print table (example with knitr::kable)
knitr::kable(tabone)
#------------------------------------------------------------------------------
# 3. Cox Model for Multiple Endpoints (Scenario 3)
#------------------------------------------------------------------------------
obj_topcat <- coxph(
Surv(time, status) ~
(drug + age + gender + nyha + bmi + smoke + chf_hosp +
copd + asthma + dm + htn + cabg) *
strata(endpoint) + cluster(id),
data = topcat
)
summary(obj_topcat)
obj_topcat$coefficients
#------------------------------------------------------------------------------
# 4. Component-Specific HRs, 95% CIs, p-values
#------------------------------------------------------------------------------
p <- 12 # number of covariates
beta <- obj_topcat$coefficients
Sigma <- obj_topcat$var
betaHF <- numeric(p)
varHF <- numeric(p)
betaMI <- numeric(p)
varMI <- numeric(p)
betaStr <- numeric(p)
varStr <- numeric(p)
# Fill in HF, MI, Stroke
for (j in seq_len(p)) {
# HF
betaHF[j] <- beta[j]
varHF[j] <- Sigma[j, j]
# MI (add interaction terms)
betaMI[j] <- beta[j] + beta[p + 2 * j - 1]
varMI[j] <- Sigma[j, j] +
Sigma[p + 2 * j - 1, p + 2 * j - 1] +
2 * Sigma[j, p + 2 * j - 1]
# Stroke
betaStr[j] <- beta[j] + beta[p + 2 * j]
varStr[j] <- Sigma[j, j] +
Sigma[p + 2 * j, p + 2 * j] +
2 * Sigma[j, p + 2 * j]
}
za <- qnorm(0.975)
hr_ci <- function(b, v, r = 2) {
se <- sqrt(v)
paste0(
round(exp(b), r), " (",
round(exp(b - za * se), r), ", ",
round(exp(b + za * se), r), ")"
)
}
hr_p <- function(b, v) {
se <- sqrt(v)
round(1 - pchisq((b / se)^2, df = 1), 3)
}
df_res <- tibble(
variable = c(
"Spironolactone", "Age (years)", "Female", "NYHA 3-4", "BMI",
"Smoke", "CHF", "COPD", "Asthma", "Diabetes", "Hypertension", "Coronary surgery"
),
HF = hr_ci(betaHF, varHF),
HF_p = hr_p(betaHF, varHF),
MI = hr_ci(betaMI, varMI),
MI_p = hr_p(betaMI, varMI),
Stroke = hr_ci(betaStr, varStr),
Stroke_p = hr_p(betaStr, varStr)
)
# Print out table
df_res
#------------------------------------------------------------------------------
# 5. Joint Test of Spironolactone Effect on HF & Stroke
#------------------------------------------------------------------------------
beta_q <- beta[c(1, 14)]
Sigma_q <- Sigma[c(1, 14), c(1, 14)]
chisq2 <- t(beta_q) %*% solve(Sigma_q) %*% beta_q
pval <- 1 - pchisq(chisq2, 2)
# pval is the p-value for the joint test