Chapter 8 - Multivariate Failure Times

Slides

Lecture slides here. (To convert html to pdf, press E \(\to\) Print \(\to\) Destination: Save to pdf)

Chapter Summary

When analyzing multiple correlated event times—arising from either multiple failure types per subject (e.g., different clinical outcomes within the same patient) or clusters of subjects (e.g., families, institutions)—standard univariate approaches must be adapted to account for within-unit dependence. To do so, two broad modeling frameworks are useful:

  1. Shared-frailty (conditional) models, which introduce a unit-level random effect (frailty) that influences all failure times in that unit, and
  2. Marginal models, which specify each event’s hazard at the population level and account for correlation via robust (sandwich) variance estimation.

Both frameworks extend Cox-type survival methods to multivariate data but differ in interpretation (conditional vs. population-averaged effects), assumptions about the joint distribution, and how correlation parameters are handled.

Shared-frailty models

In a shared-frailty model, each unit (subject or cluster) possesses a nonnegative frailty \(\xi\). Conditioned on \(\xi\), the component event times in that unit behave independently; marginally, they are correlated via the distribution of \(\xi\). A typical conditional Cox specification is:

\[ \lambda_k(t \mid Z, \xi) \;=\; \xi\,\exp\bigl(\beta_k^\mathsf{T} Z\bigr)\,\eta_{k0}'(t), \]

where \(\beta_k\) is the (conditional) log-hazard ratio, and \(\eta_{k0}'(t)\) is the baseline hazard for the \(k\)th event. Common frailty distributions include Gamma, positive stable, inverse Gaussian, and log-normal, each implying a distinct copula form.

  • Interpretation: The coefficient \(\beta_k\) reflects a within-unit comparison. For instance, comparing treated vs. untreated in the same frailty stratum yields the hazard ratio \(\exp(\beta_k)\).
  • Estimation: Often via an EM algorithm that imputes each unit’s frailty \(\xi_i\). Alternatively, software like R’s survival::coxph() can fit a Gamma-frailty Cox model directly with frailty(id, distribution="gamma"). This yields both regression parameters and an estimate of the frailty variance, capturing the correlation strength.

Marginal models

A marginal Cox model specifies each event’s hazard averaged over the distribution of unobserved frailties or random effects:

\[ \Lambda_k(t \mid Z) \;=\; \exp\bigl(\beta_k^\mathsf{T}Z\bigr)\,\Lambda_{k0}(t). \]

The within-unit dependence is handled by a robust (sandwich) variance estimator, rather than an explicit random-effects structure.

  • Interpretation: \(\beta_k\) gives a population-level hazard ratio, in contrast to the conditional hazard ratio from shared-frailty models.
  • Estimation: The partial-likelihood score is constructed as if events were independent, but the covariance of \(\hat\beta\) is corrected using a cluster-based sandwich estimator (e.g., cluster(id) in R).

Three scenarios

When multiple event types exist (e.g., heart failure vs. stroke) or multiple members in a cluster, one can combine or separate the baselines and covariate effects:

  1. Same \(\beta\) and same baseline: All events share one regression vector and one baseline hazard.
  2. Same \(\beta\) but different baselines: Impose a single set of covariate effects while allowing event types (or clusters) different baseline hazards.
  3. Different \(\beta\) and different baselines: Each event type (or group) has its own regression parameter vector and baseline hazard.

These choices apply to both shared-frailty and marginal modeling frameworks, depending on scientific questions and study design.

Example R commands

Below are illustration snippets in R using the survival package, demonstrating how to fit both shared-frailty (conditional) and marginal Cox models under the three scenario types. We assume a data frame df with columns:

  • time (event or censor time),
  • status (0/1 indicator),
  • id (cluster or subject identifier),
  • enum (if multiple event types),
  • covariates (e.g., x1, x2).

Shared-frailty (conditional) models

# Scenario 1: Single regression vector, single baseline hazard
fit_sf_scenario1 <- coxph(
  Surv(time, status) ~ x1 + x2 + frailty(id, distribution = "gamma"),
  data = df
)
# Scenario 2: One covariate effect vector, separate baseline hazards
fit_sf_scenario2 <- coxph(
  Surv(time, status) ~ x1 + x2 + strata(enum)
    + frailty(id, distribution = "gamma",
  data = df
)
# Scenario 3: Fully distinct covariate vectors and baseline hazards
fit_sf_scenario3 <- coxph(
  Surv(time, status) ~ (x1 + x2) * strata(enum)
    + frailty(id, distribution = "gamma",
  data = df
)

Marginal (population-averaged) models

# Scenario 1: Single beta, single baseline across events
fit_marg_scenario1 <- coxph(
  Surv(time, status) ~ x1 + x2 + cluster(id),
  data = df
)
# Scenario 2: Shared covariate effects, separate baseline hazards
fit_marg_scenario2 <- coxph(
  Surv(time, status) ~ x1 + x2 + strata(enum) + cluster(id),
  data = df
)
# Scenario 3: Event-type-specific beta vectors and baselines
fit_marg_scenario3 <- coxph(
  Surv(time, status) ~ (x1 + x2) * strata(enum) + cluster(id),
  data = df
)

Conclusion

Modeling multiple correlated time-to-event outcomes requires specialized approaches:

Shared-frailty (conditional) models:

  • Introduce a random effect \(\xi\) common to all events in a cluster.
  • Covariate effects are interpreted within-unit.
  • Provide direct correlation estimates via frailty variance but need stronger assumptions and more computation.

Marginal models:

  • Focus on population-level hazard functions.
  • Covariate effects are population-averaged.
  • Dependence is handled by robust (sandwich) variances, requiring fewer assumptions but not yielding explicit correlation measures.

The choice between these approaches depends on whether unit-specific or population-averaged interpretations are of interest, along with practical considerations of model complexity and assumptions.

R Code

Show the code
###############################################################################
# Chapter 8 R Code
#
# This script reproduces all major numerical results in Chapter 8, including:
#   1. Clustered data analysis (NCCTG lung cancer study)
#   2. Bivariate marginal Cox model (Diabetic Retinopathy Study)
#   3. Multistate endpoints (TOPCAT trial)
###############################################################################

library(survival)
library(tidyverse)
library(patchwork)

#==============================================================================
# (A) NCCTG Lung Cancer Study (Clustered Data)
#==============================================================================
#------------------------------------------------------------------------------
# 1. Read the NCCTG lung cancer data
#    (clustered by institution)
#------------------------------------------------------------------------------
df <- read.table("Data//NCCTG//lung.txt")
head(df)

#------------------------------------------------------------------------------
# 2. Plot Follow-Up by Institution and Sex
#------------------------------------------------------------------------------
inst_by_sex_fu_plot <- function(df) {
  df %>%
    ggplot(aes(y = reorder(id, time), x = time, color = factor(2 - sex))) +
    geom_linerange(aes(xmin = 0, xmax = time)) +
    geom_point(aes(shape = factor(status)), size = 2, fill = "white") +
    geom_vline(xintercept = 0, linewidth = 1) +
    facet_grid(inst ~ ., scales = "free", space = "free", switch = "y") +
    theme_minimal() +
    scale_x_continuous(
      "Time (months)",
      limits  = c(0, 36),
      breaks  = seq(0, 36, by = 12),
      expand  = c(0, 0.25)
    ) +
    scale_y_discrete("Patients (by institution)") +
    scale_shape_manual(
      values = c(23, 19),
      labels = c("Censoring", "Death")
    ) +
    scale_color_brewer(
      palette = "Set1",
      labels  = c("Female", "Male")
    ) +
    theme(
      strip.background = element_rect(fill = "gray90", color = "gray90"),
      axis.text.y      = element_blank(),
      axis.ticks.y     = element_blank(),
      axis.line.y      = element_blank(),
      panel.grid.major.y = element_blank(),
      legend.title     = element_blank()
    )
}

p1 <- inst_by_sex_fu_plot(df |> filter(inst <= 11))
p2 <- inst_by_sex_fu_plot(df |> filter(inst > 11))

mul_lung_fu <- p1 + p2 +
  plot_layout(ncol = 2, guides = "collect") &
  theme(legend.position = "top")

# ggsave("mul_lung_fu.pdf", mul_lung_fu, width = 8, height = 10)
# ggsave("mul_lung_fu.eps", mul_lung_fu, width = 8, height = 10)

#------------------------------------------------------------------------------
# 3. Cox Model with Institution-Specific Frailty
#------------------------------------------------------------------------------
df$sex <- factor(df$sex)

obj <- coxph(
  Surv(time, status) ~ age + sex + phec + phkn + ptkn + wl +
    frailty(inst, distribution = "gamma"),
  data = df
)
# Alternatively:
# obj <- coxph(
#   Surv(time, status) ~ age + sex + phec + phkn + ptkn + wl +
#     frailty(inst, distribution = "gaussian"),
#   data = df
# )

summary(obj) # model summary
obj$coefficients # extract beta
obj$frail # extract log-frailty

#------------------------------------------------------------------------------
# 4. Naive Cox Model (No Frailty)
#------------------------------------------------------------------------------
obj_naive <- coxph(
  Surv(time, status) ~ age + sex + phec + phkn + ptkn + wl,
  data = df
)
summary(obj_naive)

#------------------------------------------------------------------------------
# 5. Subject-Specific Survival Curves (Figure 8.2)
#------------------------------------------------------------------------------
# Typical patient with median values
med_age  <- median(df$age, na.rm = TRUE)
med_phkn <- median(df$phkn, na.rm = TRUE)
med_ptkn <- median(df$ptkn, na.rm = TRUE)
med_wl   <- median(df$wl, na.rm = TRUE)

beta     <- obj$coefficients
base_obj <- basehaz(obj, centered = FALSE)
eta      <- base_obj$hazard
t        <- base_obj$time

# Covariate profiles
# ECOG: 0, 1, 2, 3
zf0 <- c(med_age, 1, 0, med_phkn, med_ptkn, med_wl)
zf1 <- c(med_age, 1, 1, med_phkn, med_ptkn, med_wl)
zf2 <- c(med_age, 1, 2, med_phkn, med_ptkn, med_wl)
zf3 <- c(med_age, 1, 3, med_phkn, med_ptkn, med_wl)

zm0 <- c(med_age, 0, 0, med_phkn, med_ptkn, med_wl)
zm1 <- c(med_age, 0, 1, med_phkn, med_ptkn, med_wl)
zm2 <- c(med_age, 0, 2, med_phkn, med_ptkn, med_wl)
zm3 <- c(med_age, 0, 3, med_phkn, med_ptkn, med_wl)

# Base R plots for female
par(mfrow = c(1, 2))
plot(
  t,
  exp(-exp(sum(beta * zf0)) * eta),
  type = "s",
  xlim = c(0, 35),
  ylim = c(0, 1),
  frame = FALSE,
  lty   = 1,
  main  = "Female",
  xlab  = "Time (months)",
  ylab  = "Survival probabilities",
  lwd   = 2,
  cex.lab = 1.3,
  cex.axis= 1.3,
  cex.main= 1.3
)
lines(t, exp(-exp(sum(beta * zf1)) * eta), lty = 2, lwd = 2)
lines(t, exp(-exp(sum(beta * zf2)) * eta), lty = 3, lwd = 2)
lines(t, exp(-exp(sum(beta * zf3)) * eta), lty = 4, lwd = 2)
legend("topright", lty = 1:4, lwd = 2, cex = 1.2, paste("ECOG", 0:3))

# Base R plots for male
plot(
  t,
  exp(-exp(sum(beta * zm0)) * eta),
  type  = "s",
  xlim  = c(0, 35),
  ylim  = c(0, 1),
  frame = FALSE,
  lty   = 1,
  main  = "Male",
  xlab  = "Time (months)",
  ylab  = "Survival probabilities",
  lwd   = 2,
  cex.lab = 1.3,
  cex.axis= 1.3,
  cex.main= 1.3
)
lines(t, exp(-exp(sum(beta * zm1)) * eta), lty = 2, lwd = 2)
lines(t, exp(-exp(sum(beta * zm2)) * eta), lty = 3, lwd = 2)
lines(t, exp(-exp(sum(beta * zm3)) * eta), lty = 4, lwd = 2)
legend("topright", lty = 1:4, lwd = 2, cex = 1.2, paste("ECOG", 0:3))


#==============================================================================
# (B) Diabetic Retinopathy Study (Bivariate Marginal Cox Model)
#==============================================================================
#------------------------------------------------------------------------------
# 1. Read the Data
#------------------------------------------------------------------------------
df_drs <- read.table("Data//Diabetic Retinopathy Study//drs.txt")
head(df_drs)

#------------------------------------------------------------------------------
# 2. Fit Bivariate Marginal Cox Model
#------------------------------------------------------------------------------
obj_drs <- coxph(
  Surv(time, status) ~ trt + type + trt:type + risk + cluster(id),
  data = df_drs
)
summary(obj_drs)

#------------------------------------------------------------------------------
# 3. Table 8.1: Marginal Cox Model Analysis (Estimates, SE, p-values)
#------------------------------------------------------------------------------
coeff  <- summary(obj_drs)$coefficients
c1     <- coeff[, 1]  # beta estimate
c2     <- coeff[, 4]  # robust SE
c3     <- coeff[, 6]  # robust p-value
c4     <- coeff[, 3]  # naive SE
c5     <- 1 - pchisq((c1 / c4)^2, 1)  # naive p-value

# Output table
noquote(round(cbind(c1, c2, c3, c4, c5), 3))

#------------------------------------------------------------------------------
# 4. Prediction of Vision-Retention Probabilities (Figure 8.4)
#------------------------------------------------------------------------------
Lt <- basehaz(obj_drs, centered = FALSE)
t  <- Lt$time
L  <- Lt$hazard
beta <- coeff[, 1]

# Covariate profiles: adult vs. juvenile, control vs. treatment
adult_contr <- exp(-exp(sum(beta * c(0, 0, 10, 0))) * L)
adult_trt   <- exp(-exp(sum(beta * c(1, 0, 10, 0))) * L)
juv_contr   <- exp(-exp(sum(beta * c(0, 1, 10, 0))) * L)
juv_trt     <- exp(-exp(sum(beta * c(1, 1, 10, 1))) * L)

par(mfrow = c(1, 2))
# Adult
plot(
  t, adult_contr,
  type  = "s",
  xlim  = c(0, 80),
  ylim  = c(0, 1),
  frame.plot = FALSE,
  lty   = 3,
  main  = "Adult",
  xlab  = "Time (months)",
  ylab  = "Vision-retention probabilities",
  lwd   = 2,
  cex.lab = 1.2,
  cex.axis= 1.2,
  cex.main= 1.2
)
lines(t, adult_trt, lty = 1, lwd = 2)

# Juvenile
plot(
  t, juv_contr,
  type  = "s",
  xlim  = c(0, 80),
  ylim  = c(0, 1),
  frame = FALSE,
  lty   = 3,
  main  = "Juvenile",
  xlab  = "Time (months)",
  ylab  = "Vision-retention probabilities",
  lwd   = 2,
  cex.lab = 1.2,
  cex.axis= 1.2,
  cex.main= 1.2
)
lines(t, juv_trt, lty = 1, lwd = 2)


#==============================================================================
# (C) TOPCAT Trial (Multiple Endpoints)
#==============================================================================
#------------------------------------------------------------------------------
# 1. Read and Inspect the Data
#------------------------------------------------------------------------------
topcat <- read.table("Data//TOPCAT//topcat.txt")

# Median follow-up
topcat %>%
  group_by(id) %>%
  slice_max(time) %>%
  slice_head() %>%
  ungroup() %>%
  summarize(median(time))

#------------------------------------------------------------------------------
# 2. Descriptive Analysis
#------------------------------------------------------------------------------
tmp <- topcat %>%
  mutate(
    drug   = if_else(drug == "Spiro", "Spironolactone", "Placebo"),
    gender = if_else(gender == "1:Male", "Male", "Female")
  )

# De-duplicate
df_topcat <- tmp %>%
  pivot_wider(
    id_cols      = id,
    names_from   = endpoint,
    values_from  = c(time, status)
  ) %>%
  left_join(
    tmp %>% filter(endpoint == "HF"),
    join_by(id)
  )

# Helper: median (IQR) function
med_iqr <- function(x, r = 1) {
  qt <- quantile(x, na.rm = TRUE)
  paste0(
    round(qt[3], r), " (",
    round(qt[2], r), ", ",
    round(qt[4], r), ")"
  )
}

# Quantitative variables table
tab_quant <- df_topcat %>%
  filter(endpoint == "HF") %>%
  group_by(drug) %>%
  summarize(across(c(age, bmi, hr), med_iqr)) %>%
  pivot_longer(!drug, values_to = "value", names_to = "name") %>%
  pivot_wider(values_from = value, names_from = drug) %>%
  mutate(
    name = case_when(
      name == "age" ~ "Age (years)",
      name == "bmi" ~ "BMI (kg/m^2)",
      name == "hr"  ~ "Heart rate (per min)"
    )
  )

# Helper function for frequency tables
freq_pct <- function(df, group, var, r = 1) {
  var_counts <- df %>%
    group_by({{ group }}, {{ var }}) %>%
    summarize(n = n(), .groups = "drop")

  var_counts %>%
    left_join(
      var_counts %>% group_by({{ group }}) %>% summarize(N = sum(n)),
      by = join_by({{ group }})
    ) %>%
    mutate(
      value = paste0(n, " (", round(100 * n / N, r), "%)")
    ) %>%
    select(-c(n, N)) %>%
    pivot_wider(names_from = {{ group }}, values_from = value) %>%
    rename(name = {{ var }})
}

# Categorical variables
gender <- df_topcat %>%
  freq_pct(drug, gender) %>%
  mutate(name = paste0("Gender - ", name))

race <- df_topcat %>%
  freq_pct(drug, race) %>%
  mutate(name = paste0("Race - ", name))

nyha <- df_topcat %>%
  freq_pct(drug, nyha) %>%
  mutate(name = paste0("NYHA - ", name)) %>%
  filter(!is.na(name))

# Helper for binary condition
bin_pct <- function(condition, r = 1) {
  n <- sum(condition, na.rm = TRUE)
  N <- length(condition)
  paste0(n, " (", round(100 * n / N, r), "%)")
}

tabin <- df_topcat %>%
  group_by(drug) %>%
  summarize(
    N = n(),
    across(c(smoke:cabg, status_HF, status_MI, status_Stroke), bin_pct)
  ) %>%
  select(!N) %>%
  pivot_longer(!drug, values_to = "value", names_to = "name") %>%
  pivot_wider(values_from = value, names_from = drug) %>%
  mutate(
    name = case_when(
      name == "smoke"       ~ "Smoker",
      name == "chf_hosp"    ~ "CHF",
      name == "copd"        ~ "COPD",
      name == "asthma"      ~ "Asthma",
      name == "dm"          ~ "Diabetes",
      name == "htn"         ~ "Hypertension",
      name == "cabg"        ~ "Coronary surgery",
      name == "status_HF"   ~ "HF",
      name == "status_MI"   ~ "MI",
      name == "status_Stroke" ~ "Stroke"
    )
  )

# Event rates (per person-year)
event_rates <- df_topcat %>%
  group_by(drug) %>%
  summarize(
    `HF rate (per person-year)`     = sum(status_HF) / sum(time_HF),
    `MI rate (per person-year)`     = sum(status_MI) / sum(time_MI),
    `Stroke rate (per person-year)` = sum(status_Stroke) / sum(time_Stroke)
  ) %>%
  pivot_longer(!drug, values_to = "value", names_to = "name") %>%
  pivot_wider(values_from = value, names_from = drug) %>%
  mutate(
    Placebo         = as.character(round(Placebo, 4)),
    Spironolactone  = as.character(round(Spironolactone, 4))
  )

# Combine tables
tabone <- bind_rows(
  tab_quant[1, ],
  gender,
  race,
  nyha,
  tab_quant[-1, ],
  tabin,
  event_rates
)

# Add N to group names
colnames(tabone) <- c(
  " ",
  paste0(colnames(tabone)[2:3], " (N=", table(df_topcat$drug), ")")
)

# Print table (example with knitr::kable)
knitr::kable(tabone)

#------------------------------------------------------------------------------
# 3. Cox Model for Multiple Endpoints (Scenario 3)
#------------------------------------------------------------------------------
obj_topcat <- coxph(
  Surv(time, status) ~
    (drug + age + gender + nyha + bmi + smoke + chf_hosp +
       copd + asthma + dm + htn + cabg) *
    strata(endpoint) + cluster(id),
  data = topcat
)
summary(obj_topcat)
obj_topcat$coefficients

#------------------------------------------------------------------------------
# 4. Component-Specific HRs, 95% CIs, p-values
#------------------------------------------------------------------------------
p     <- 12  # number of covariates
beta  <- obj_topcat$coefficients
Sigma <- obj_topcat$var

betaHF  <- numeric(p)
varHF   <- numeric(p)
betaMI  <- numeric(p)
varMI   <- numeric(p)
betaStr <- numeric(p)
varStr  <- numeric(p)

# Fill in HF, MI, Stroke
for (j in seq_len(p)) {
  # HF
  betaHF[j] <- beta[j]
  varHF[j]  <- Sigma[j, j]

  # MI (add interaction terms)
  betaMI[j] <- beta[j] + beta[p + 2 * j - 1]
  varMI[j]  <- Sigma[j, j] +
               Sigma[p + 2 * j - 1, p + 2 * j - 1] +
               2 * Sigma[j, p + 2 * j - 1]

  # Stroke
  betaStr[j] <- beta[j] + beta[p + 2 * j]
  varStr[j]  <- Sigma[j, j] +
               Sigma[p + 2 * j, p + 2 * j] +
               2 * Sigma[j, p + 2 * j]
}

za <- qnorm(0.975)

hr_ci <- function(b, v, r = 2) {
  se <- sqrt(v)
  paste0(
    round(exp(b), r), " (",
    round(exp(b - za * se), r), ", ",
    round(exp(b + za * se), r), ")"
  )
}

hr_p <- function(b, v) {
  se <- sqrt(v)
  round(1 - pchisq((b / se)^2, df = 1), 3)
}

df_res <- tibble(
  variable = c(
    "Spironolactone", "Age (years)", "Female", "NYHA 3-4", "BMI",
    "Smoke", "CHF", "COPD", "Asthma", "Diabetes", "Hypertension", "Coronary surgery"
  ),
  HF       = hr_ci(betaHF, varHF),
  HF_p     = hr_p(betaHF, varHF),
  MI       = hr_ci(betaMI, varMI),
  MI_p     = hr_p(betaMI, varMI),
  Stroke   = hr_ci(betaStr, varStr),
  Stroke_p = hr_p(betaStr, varStr)
)

# Print out table
df_res

#------------------------------------------------------------------------------
# 5. Joint Test of Spironolactone Effect on HF & Stroke
#------------------------------------------------------------------------------
beta_q  <- beta[c(1, 14)]
Sigma_q <- Sigma[c(1, 14), c(1, 14)]

chisq2  <- t(beta_q) %*% solve(Sigma_q) %*% beta_q
pval    <- 1 - pchisq(chisq2, 2)

# pval is the p-value for the joint test