Chapter 8 - Multivariate Failure Times

Slides

Lecture slides here. (To convert html to pdf, press E \(\to\) Print \(\to\) Destination: Save to pdf)

Chapter Summary

When analyzing multiple correlated event times—arising from either multiple failure types per subject (e.g., different clinical outcomes within the same patient) or clusters of subjects (e.g., families, institutions)—standard univariate approaches must be adapted to account for within-unit dependence. To do so, two broad modeling frameworks are useful:

  1. Shared-frailty (conditional) models, which introduce a unit-level random effect (frailty) that influences all failure times in that unit, and
  2. Marginal models, which specify each event’s hazard at the population level and account for correlation via robust (sandwich) variance estimation.

Both frameworks extend Cox-type survival methods to multivariate data but differ in interpretation (conditional vs. population-averaged effects), assumptions about the joint distribution, and how correlation parameters are handled.

Shared-frailty models

In a shared-frailty model, each unit (subject or cluster) possesses a nonnegative frailty \(\xi\). Conditioned on \(\xi\), the component event times in that unit behave independently; marginally, they are correlated via the distribution of \(\xi\). A typical conditional Cox specification is:

\[ \lambda_k(t \mid Z, \xi) \;=\; \xi\,\exp\bigl(\beta_k^\mathsf{T} Z\bigr)\,\eta_{k0}'(t), \]

where \(\beta_k\) is the (conditional) log-hazard ratio, and \(\eta_{k0}'(t)\) is the baseline hazard for the \(k\)th event. Common frailty distributions include Gamma, positive stable, inverse Gaussian, and log-normal, each implying a distinct copula form.

  • Interpretation: The coefficient \(\beta_k\) reflects a within-unit comparison. For instance, comparing treated vs. untreated in the same frailty stratum yields the hazard ratio \(\exp(\beta_k)\).
  • Estimation: Often via an EM algorithm that imputes each unit’s frailty \(\xi_i\). Alternatively, software like R’s survival::coxph() can fit a Gamma-frailty Cox model directly with frailty(id, distribution="gamma"). This yields both regression parameters and an estimate of the frailty variance, capturing the correlation strength.

Marginal models

A marginal Cox model specifies each event’s hazard averaged over the distribution of unobserved frailties or random effects:

\[ \Lambda_k(t \mid Z) \;=\; \exp\bigl(\beta_k^\mathsf{T}Z\bigr)\,\Lambda_{k0}(t). \]

The within-unit dependence is handled by a robust (sandwich) variance estimator, rather than an explicit random-effects structure.

  • Interpretation: \(\beta_k\) gives a population-level hazard ratio, in contrast to the conditional hazard ratio from shared-frailty models.
  • Estimation: The partial-likelihood score is constructed as if events were independent, but the covariance of \(\hat\beta\) is corrected using a cluster-based sandwich estimator (e.g., cluster(id) in R).

Three scenarios

When multiple event types exist (e.g., heart failure vs. stroke) or multiple members in a cluster, one can combine or separate the baselines and covariate effects:

  1. Same \(\beta\) and same baseline: All events share one regression vector and one baseline hazard.
  2. Same \(\beta\) but different baselines: Impose a single set of covariate effects while allowing event types (or clusters) different baseline hazards.
  3. Different \(\beta\) and different baselines: Each event type (or group) has its own regression parameter vector and baseline hazard.

These choices apply to both shared-frailty and marginal modeling frameworks, depending on scientific questions and study design.

Example R commands

Below are illustration snippets in R using the survival package, demonstrating how to fit both shared-frailty (conditional) and marginal Cox models under the three scenario types. We assume a data frame df with columns:

  • time (event or censor time),
  • status (0/1 indicator),
  • id (cluster or subject identifier),
  • enum (if multiple event types),
  • covariates (e.g., x1, x2).

Shared-frailty (conditional) models

# Scenario 1: Single regression vector, single baseline hazard
fit_sf_scenario1 <- coxph(
  Surv(time, status) ~ x1 + x2 + frailty(id, distribution = "gamma"),
  data = df
)
# Scenario 2: One covariate effect vector, separate baseline hazards
fit_sf_scenario2 <- coxph(
  Surv(time, status) ~ x1 + x2 + strata(enum)
    + frailty(id, distribution = "gamma",
  data = df
)
# Scenario 3: Fully distinct covariate vectors and baseline hazards
fit_sf_scenario3 <- coxph(
  Surv(time, status) ~ (x1 + x2) * strata(enum)
    + frailty(id, distribution = "gamma",
  data = df
)

Marginal (population-averaged) models

# Scenario 1: Single beta, single baseline across events
fit_marg_scenario1 <- coxph(
  Surv(time, status) ~ x1 + x2 + cluster(id),
  data = df
)
# Scenario 2: Shared covariate effects, separate baseline hazards
fit_marg_scenario2 <- coxph(
  Surv(time, status) ~ x1 + x2 + strata(enum) + cluster(id),
  data = df
)
# Scenario 3: Event-type-specific beta vectors and baselines
fit_marg_scenario3 <- coxph(
  Surv(time, status) ~ (x1 + x2) * strata(enum) + cluster(id),
  data = df
)

Conclusion

Modeling multiple correlated time-to-event outcomes requires specialized approaches:

Shared-frailty (conditional) models:

  • Introduce a random effect \(\xi\) common to all events in a cluster.
  • Covariate effects are interpreted within-unit.
  • Provide direct correlation estimates via frailty variance but need stronger assumptions and more computation.

Marginal models:

  • Focus on population-level hazard functions.
  • Covariate effects are population-averaged.
  • Dependence is handled by robust (sandwich) variances, requiring fewer assumptions but not yielding explicit correlation measures.

The choice between these approaches depends on whether unit-specific or population-averaged interpretations are of interest, along with practical considerations of model complexity and assumptions.

R Code

Show the code
###############################################################################
# Chapter 8 R Code
#
# This script reproduces major numerical results in Chapter 8, including
#   1. Clustered data analysis (NCCTG lung cancer study)
#   2. Bivariate marginal Cox model (Diabetic Retinopathy Study)
#   3. Multiple-endpoint analysis (TOPCAT trial)
###############################################################################

library(survival)   # Cox regression, frailty models, baseline hazards
library(tidyverse)  # Data manipulation and ggplot2 graphics
library(patchwork)  # Multi-panel plot assembly

# =============================================================================
# (A) NCCTG Lung Cancer Study (Clustered Data)
# =============================================================================

# -----------------------------------------------------------------------------
# 1. Read the NCCTG lung cancer data
#    Patients are clustered by treating institution (inst)
# -----------------------------------------------------------------------------
df <- read.table("Data//NCCTG//lung.txt")
head(df)

# -----------------------------------------------------------------------------
# 2. Plot follow-up by institution and sex
# -----------------------------------------------------------------------------
# Each subject is shown by a horizontal follow-up line from 0 to observed time.
# Point shape distinguishes censoring versus death; color distinguishes sex.
inst_by_sex_fu_plot <- function(df) {
  df %>%
    ggplot(aes(y = reorder(id, time), x = time, color = factor(2 - sex))) +
    geom_linerange(aes(xmin = 0, xmax = time)) +
    geom_point(aes(shape = factor(status)), size = 2, fill = "white") +
    geom_vline(xintercept = 0, linewidth = 1) +
    facet_grid(inst ~ ., scales = "free", space = "free", switch = "y") +
    theme_minimal() +
    scale_x_continuous(
      "Time (months)",
      limits = c(0, 36),
      breaks = seq(0, 36, by = 12),
      expand = c(0, 0.25)
    ) +
    scale_y_discrete("Patients (by institution)") +
    scale_shape_manual(
      values = c(23, 19),
      labels = c("Censoring", "Death")
    ) +
    scale_color_brewer(
      palette = "Set1",
      labels = c("Female", "Male")
    ) +
    theme(
      strip.background   = element_rect(fill = "gray90", color = "gray90"),
      axis.text.y        = element_blank(),
      axis.ticks.y       = element_blank(),
      axis.line.y        = element_blank(),
      panel.grid.major.y = element_blank(),
      legend.title       = element_blank()
    )
}

p1 <- inst_by_sex_fu_plot(df |> filter(inst <= 11)) +
  theme(legend.position = "top")

p2 <- inst_by_sex_fu_plot(df |> filter(inst > 11)) +
  theme(legend.position = "top")

mul_lung_fu <- p1 + p2 + plot_layout(ncol = 2)

mul_lung_fu

# Optional saving
# ggsave("mul_lung_fu.pdf", mul_lung_fu, width = 8, height = 10)
# ggsave("mul_lung_fu.eps", mul_lung_fu, width = 8, height = 10)

# -----------------------------------------------------------------------------
# 3. Cox model with institution-specific frailty
# -----------------------------------------------------------------------------
# Treat sex as categorical
df$sex <- factor(df$sex)

# Gamma shared-frailty model:
# subjects within the same institution share an unobserved frailty term
obj <- coxph(
  Surv(time, status) ~ age + sex + phec + phkn + ptkn + wl +
    frailty(inst, distribution = "gamma"),
  data = df
)

# Alternative:
# obj <- coxph(
#   Surv(time, status) ~ age + sex + phec + phkn + ptkn + wl +
#     frailty(inst, distribution = "gaussian"),
#   data = df
# )

summary(obj)

# Estimated regression coefficients beta
obj$coefficients

# Estimated institution-specific log-frailties
obj$frail

# -----------------------------------------------------------------------------
# 4. Naive Cox model without frailty
# -----------------------------------------------------------------------------
# This ignores within-institution dependence
obj_naive <- coxph(
  Surv(time, status) ~ age + sex + phec + phkn + ptkn + wl,
  data = df
)
summary(obj_naive)

# -----------------------------------------------------------------------------
# 5. Subject-specific survival curves for representative profiles
# -----------------------------------------------------------------------------
# Construct survival curves for typical female and male patients
# across ECOG values 0, 1, 2, 3

med_age  <- median(df$age, na.rm = TRUE)
med_phkn <- median(df$phkn, na.rm = TRUE)
med_ptkn <- median(df$ptkn, na.rm = TRUE)
med_wl   <- median(df$wl, na.rm = TRUE)

beta     <- obj$coefficients
base_obj <- basehaz(obj, centered = FALSE)
eta      <- base_obj$hazard
t        <- base_obj$time

# Covariate profiles
# sex coding in the original data: 1 = male, 2 = female
# factor(sex) creates a treatment contrast relative to the reference level
# The vectors below match the order:
# age, sex, phec, phkn, ptkn, wl
zf0 <- c(med_age, 1, 0, med_phkn, med_ptkn, med_wl)
zf1 <- c(med_age, 1, 1, med_phkn, med_ptkn, med_wl)
zf2 <- c(med_age, 1, 2, med_phkn, med_ptkn, med_wl)
zf3 <- c(med_age, 1, 3, med_phkn, med_ptkn, med_wl)

zm0 <- c(med_age, 0, 0, med_phkn, med_ptkn, med_wl)
zm1 <- c(med_age, 0, 1, med_phkn, med_ptkn, med_wl)
zm2 <- c(med_age, 0, 2, med_phkn, med_ptkn, med_wl)
zm3 <- c(med_age, 0, 3, med_phkn, med_ptkn, med_wl)

par(mfrow = c(1, 2))

# Female
plot(
  t,
  exp(-exp(sum(beta * zf0)) * eta),
  type     = "s",
  xlim     = c(0, 35),
  ylim     = c(0, 1),
  frame    = FALSE,
  lty      = 1,
  main     = "Female",
  xlab     = "Time (months)",
  ylab     = "Survival probabilities",
  lwd      = 2,
  cex.lab  = 1.3,
  cex.axis = 1.3,
  cex.main = 1.3
)
lines(t, exp(-exp(sum(beta * zf1)) * eta), lty = 2, lwd = 2)
lines(t, exp(-exp(sum(beta * zf2)) * eta), lty = 3, lwd = 2)
lines(t, exp(-exp(sum(beta * zf3)) * eta), lty = 4, lwd = 2)
legend("topright", lty = 1:4, lwd = 2, cex = 1.2, legend = paste("ECOG", 0:3))

# Male
plot(
  t,
  exp(-exp(sum(beta * zm0)) * eta),
  type     = "s",
  xlim     = c(0, 35),
  ylim     = c(0, 1),
  frame    = FALSE,
  lty      = 1,
  main     = "Male",
  xlab     = "Time (months)",
  ylab     = "Survival probabilities",
  lwd      = 2,
  cex.lab  = 1.3,
  cex.axis = 1.3,
  cex.main = 1.3
)
lines(t, exp(-exp(sum(beta * zm1)) * eta), lty = 2, lwd = 2)
lines(t, exp(-exp(sum(beta * zm2)) * eta), lty = 3, lwd = 2)
lines(t, exp(-exp(sum(beta * zm3)) * eta), lty = 4, lwd = 2)
legend("topright", lty = 1:4, lwd = 2, cex = 1.2, legend = paste("ECOG", 0:3))

# =============================================================================
# (B) Diabetic Retinopathy Study (Bivariate Marginal Cox Model)
# =============================================================================

# -----------------------------------------------------------------------------
# 1. Read the data
# -----------------------------------------------------------------------------
df_drs <- read.table("Data//Diabetic Retinopathy Study//drs.txt")
head(df_drs)

# -----------------------------------------------------------------------------
# 2. Fit a bivariate marginal Cox model
# -----------------------------------------------------------------------------
# cluster(id) gives robust variance estimation to account for correlation
# between two eyes from the same subject
obj_drs <- coxph(
  Surv(time, status) ~ trt + type + trt:type + risk + cluster(id),
  data = df_drs
)
summary(obj_drs)

# -----------------------------------------------------------------------------
# 3. Table 8.1: estimates, SEs, and p-values
# -----------------------------------------------------------------------------
coeff <- summary(obj_drs)$coefficients

# Regression coefficient estimates
c1 <- coeff[, 1]

# Robust SEs and p-values (accounting for within-subject correlation)
c2 <- coeff[, 4]
c3 <- coeff[, 6]

# Naive SEs and corresponding p-values
c4 <- coeff[, 3]
c5 <- 1 - pchisq((c1 / c4)^2, 1)

noquote(round(cbind(c1, c2, c3, c4, c5), 3))

# -----------------------------------------------------------------------------
# 4. Predicted vision-retention probabilities
# -----------------------------------------------------------------------------
# Use Breslow baseline cumulative hazard and plug in selected covariate profiles
Lt   <- basehaz(obj_drs, centered = FALSE)
t    <- Lt$time
L    <- Lt$hazard
beta <- coeff[, 1]

# Profiles:
# trt, type, risk, trt:type
adult_contr <- exp(-exp(sum(beta * c(0, 0, 10, 0))) * L)
adult_trt   <- exp(-exp(sum(beta * c(1, 0, 10, 0))) * L)
juv_contr   <- exp(-exp(sum(beta * c(0, 1, 10, 0))) * L)
juv_trt     <- exp(-exp(sum(beta * c(1, 1, 10, 1))) * L)

par(mfrow = c(1, 2))

# Adult-onset diabetes
plot(
  t, adult_contr,
  type       = "s",
  xlim       = c(0, 80),
  ylim       = c(0, 1),
  frame.plot = FALSE,
  lty        = 3,
  main       = "Adult",
  xlab       = "Time (months)",
  ylab       = "Vision-retention probabilities",
  lwd        = 2,
  cex.lab    = 1.2,
  cex.axis   = 1.2,
  cex.main   = 1.2
)
lines(t, adult_trt, lty = 1, lwd = 2)

# Juvenile-onset diabetes
plot(
  t, juv_contr,
  type     = "s",
  xlim     = c(0, 80),
  ylim     = c(0, 1),
  frame    = FALSE,
  lty      = 3,
  main     = "Juvenile",
  xlab     = "Time (months)",
  ylab     = "Vision-retention probabilities",
  lwd      = 2,
  cex.lab  = 1.2,
  cex.axis = 1.2,
  cex.main = 1.2
)
lines(t, juv_trt, lty = 1, lwd = 2)

# =============================================================================
# (C) TOPCAT Trial (Multiple Endpoints)
# =============================================================================

# -----------------------------------------------------------------------------
# 1. Read and inspect the data
# -----------------------------------------------------------------------------
topcat <- read.table("Data//TOPCAT//topcat.txt")

# Median follow-up across subjects
topcat %>%
  group_by(id) %>%
  slice_max(time) %>%
  slice_head() %>%
  ungroup() %>%
  summarize(median(time))

# -----------------------------------------------------------------------------
# 2. Descriptive analysis
# -----------------------------------------------------------------------------
tmp <- topcat %>%
  mutate(
    drug   = if_else(drug == "Spiro", "Spironolactone", "Placebo"),
    gender = if_else(gender == "1:Male", "Male", "Female")
  )

# Convert to one row per subject for descriptive summaries
df_topcat <- tmp %>%
  pivot_wider(
    id_cols    = id,
    names_from = endpoint,
    values_from = c(time, status)
  ) %>%
  left_join(
    tmp %>% filter(endpoint == "HF"),
    join_by(id)
  )

# Median (IQR) formatter for quantitative variables
med_iqr <- function(x, r = 1) {
  qt <- quantile(x, na.rm = TRUE)
  paste0(
    round(qt[3], r), " (",
    round(qt[2], r), ", ",
    round(qt[4], r), ")"
  )
}

# Quantitative variables by treatment group
tab_quant <- df_topcat %>%
  filter(endpoint == "HF") %>%
  group_by(drug) %>%
  summarize(across(c(age, bmi, hr), med_iqr)) %>%
  pivot_longer(!drug, values_to = "value", names_to = "name") %>%
  pivot_wider(values_from = value, names_from = drug) %>%
  mutate(
    name = case_when(
      name == "age" ~ "Age (years)",
      name == "bmi" ~ "BMI (kg/m^2)",
      name == "hr"  ~ "Heart rate (per min)"
    )
  )

# Frequency and percentage table for a categorical variable within treatment groups
freq_pct <- function(df, group, var, r = 1) {
  var_counts <- df %>%
    group_by({{ group }}, {{ var }}) %>%
    summarize(n = n(), .groups = "drop")

  var_counts %>%
    left_join(
      var_counts %>%
        group_by({{ group }}) %>%
        summarize(N = sum(n)),
      by = join_by({{ group }})
    ) %>%
    mutate(
      value = paste0(n, " (", round(100 * n / N, r), "%)")
    ) %>%
    select(-c(n, N)) %>%
    pivot_wider(names_from = {{ group }}, values_from = value) %>%
    rename(name = {{ var }})
}

# Categorical summaries
gender <- df_topcat %>%
  freq_pct(drug, gender) %>%
  mutate(name = paste0("Gender - ", name))

race <- df_topcat %>%
  freq_pct(drug, race) %>%
  mutate(name = paste0("Race - ", name))

nyha <- df_topcat %>%
  freq_pct(drug, nyha) %>%
  mutate(name = paste0("NYHA - ", name)) %>%
  filter(!is.na(name))

# Percentage formatter for binary indicators
bin_pct <- function(condition, r = 1) {
  n <- sum(condition, na.rm = TRUE)
  N <- length(condition)
  paste0(n, " (", round(100 * n / N, r), "%)")
}

# Binary covariates and endpoint indicators
tabin <- df_topcat %>%
  group_by(drug) %>%
  summarize(
    N = n(),
    across(c(smoke:cabg, status_HF, status_MI, status_Stroke), bin_pct)
  ) %>%
  select(!N) %>%
  pivot_longer(!drug, values_to = "value", names_to = "name") %>%
  pivot_wider(values_from = value, names_from = drug) %>%
  mutate(
    name = case_when(
      name == "smoke"         ~ "Smoker",
      name == "chf_hosp"      ~ "CHF",
      name == "copd"          ~ "COPD",
      name == "asthma"        ~ "Asthma",
      name == "dm"            ~ "Diabetes",
      name == "htn"           ~ "Hypertension",
      name == "cabg"          ~ "Coronary surgery",
      name == "status_HF"     ~ "HF",
      name == "status_MI"     ~ "MI",
      name == "status_Stroke" ~ "Stroke"
    )
  )

# Event rates per person-year by endpoint and treatment group
event_rates <- df_topcat %>%
  group_by(drug) %>%
  summarize(
    `HF rate (per person-year)`     = sum(status_HF) / sum(time_HF),
    `MI rate (per person-year)`     = sum(status_MI) / sum(time_MI),
    `Stroke rate (per person-year)` = sum(status_Stroke) / sum(time_Stroke)
  ) %>%
  pivot_longer(!drug, values_to = "value", names_to = "name") %>%
  pivot_wider(values_from = value, names_from = drug) %>%
  mutate(
    Placebo        = as.character(round(Placebo, 4)),
    Spironolactone = as.character(round(Spironolactone, 4))
  )

# Combine into a Table 1-style summary
tabone <- bind_rows(
  tab_quant[1, ],
  gender,
  race,
  nyha,
  tab_quant[-1, ],
  tabin,
  event_rates
)

# Add group sizes to column labels
colnames(tabone) <- c(
  " ",
  paste0(colnames(tabone)[2:3], " (N=", table(df_topcat$drug), ")")
)

# Display table
knitr::kable(tabone)

# -----------------------------------------------------------------------------
# 3. Cox model for multiple endpoints (Scenario 3)
# -----------------------------------------------------------------------------
# endpoint-specific baselines are handled via strata(endpoint)
# cluster(id) gives robust variance for within-subject correlation
obj_topcat <- coxph(
  Surv(time, status) ~
    (drug + age + gender + nyha + bmi + smoke + chf_hosp +
       copd + asthma + dm + htn + cabg) *
    strata(endpoint) + cluster(id),
  data = topcat
)

summary(obj_topcat)
obj_topcat$coefficients

# -----------------------------------------------------------------------------
# 4. Component-specific hazard ratios, confidence intervals, and p-values
# -----------------------------------------------------------------------------
# There are p = 12 baseline covariate effects, followed by endpoint interactions
p     <- 12
beta  <- obj_topcat$coefficients
Sigma <- obj_topcat$var

betaHF  <- numeric(p)
varHF   <- numeric(p)
betaMI  <- numeric(p)
varMI   <- numeric(p)
betaStr <- numeric(p)
varStr  <- numeric(p)

for (j in seq_len(p)) {
  # HF: reference endpoint
  betaHF[j] <- beta[j]
  varHF[j]  <- Sigma[j, j]

  # MI: main effect + MI interaction
  betaMI[j] <- beta[j] + beta[p + 2 * j - 1]
  varMI[j]  <- Sigma[j, j] +
    Sigma[p + 2 * j - 1, p + 2 * j - 1] +
    2 * Sigma[j, p + 2 * j - 1]

  # Stroke: main effect + Stroke interaction
  betaStr[j] <- beta[j] + beta[p + 2 * j]
  varStr[j]  <- Sigma[j, j] +
    Sigma[p + 2 * j, p + 2 * j] +
    2 * Sigma[j, p + 2 * j]
}

za <- qnorm(0.975)

hr_ci <- function(b, v, r = 2) {
  se <- sqrt(v)
  paste0(
    round(exp(b), r), " (",
    round(exp(b - za * se), r), ", ",
    round(exp(b + za * se), r), ")"
  )
}

hr_p <- function(b, v) {
  se <- sqrt(v)
  round(1 - pchisq((b / se)^2, df = 1), 3)
}

df_res <- tibble(
  variable = c(
    "Spironolactone", "Age (years)", "Female", "NYHA 3-4", "BMI",
    "Smoke", "CHF", "COPD", "Asthma", "Diabetes", "Hypertension",
    "Coronary surgery"
  ),
  HF       = hr_ci(betaHF, varHF),
  HF_p     = hr_p(betaHF, varHF),
  MI       = hr_ci(betaMI, varMI),
  MI_p     = hr_p(betaMI, varMI),
  Stroke   = hr_ci(betaStr, varStr),
  Stroke_p = hr_p(betaStr, varStr)
)

df_res

# -----------------------------------------------------------------------------
# 5. Joint test of spironolactone effect on HF and Stroke
# -----------------------------------------------------------------------------
# beta[1] is the HF main effect for drug
# beta[14] is the Stroke interaction for drug
beta_q  <- beta[c(1, 14)]
Sigma_q <- Sigma[c(1, 14), c(1, 14)]

chisq2 <- t(beta_q) %*% solve(Sigma_q) %*% beta_q
pval   <- 1 - pchisq(chisq2, 2)

pval