Chapter 9 - Recurrent Events

Slides

Lecture slides here. (To convert html to pdf, press E \(\to\) Print \(\to\) Destination: Save to pdf)

Chapter Summary

When a subject (or system) can experience recurrent events (e.g., repeated hospitalizations, infections, or mechanical failures), standard univariate approaches (like focusing only on the first event) can be inefficient and miss valuable information. Instead, we can model these repeated occurrences via extended Cox-type methods.

Intensity vs. rate functions

  • Intensity function \(\ell\{t \mid \overline{N}^*(t-)\}\): The instantaneous incidence of a new event at time \(t\) conditional on the entire past. Generalizes the hazard function from univariate survival.
  • Rate function \(r(t)\): The marginal incidence rate across the population. Bypasses specifying how past events affect current risk but does not fully specify the likelihood. Often paired with robust variance methods.

Poisson processes are the simplest example (memoryless arrivals), but in practice, recurrent events often cluster, so more realistic models are needed.

Multiplicative intensity/rate models

Andersen–Gill model

A direct extension of the Cox model that posits: \[ \ell\{t \mid \overline{N}^*(t-), \overline{Z}(t)\} \;=\; \exp\bigl\{\beta^\mathsf{T}Z(t)\bigr\}\,\mathrm{d}\mu_0(t). \]

  • Interpretation: The conditional risk of a new event depends only on the current covariates \(Z(t)\), not the event history (unless included in \(Z(t)\)).
  • Estimation: Uses partial-likelihood scores analogous to Cox, with counting-process style risk sets.
  • Limitation: If only baseline covariates are included, ignoring event history can be unrealistic; including post-baseline events or other time-varying covariates can confound inference about baseline effects.

Multiplicative intensity frailty models

Incorporate a random effect (frailty) \(\xi\) to induce within-subject dependence: \[ \ell\{t \mid \xi, \overline{N}^*(t-), \overline{Z}(t)\} \;=\; \xi \,\exp\bigl\{\beta^\mathsf{T} Z(t)\bigr\}\,\mathrm{d}\mu_0(t). \]

  • Example: Gamma frailty, letting \(\xi\) follow a Gamma distribution with mean 1, variance \(1/\gamma\).
  • Interpretation: Covariate effects are subject-specific; \(\beta\) reflects hazard ratios within each random-effect stratum.
  • Estimation: Typically via EM algorithm.

Proportional rates/means (LWYY) model

Targets the marginal rate: \[ r\{t \mid \overline{Z}(t)\} \;=\; \exp\bigl\{\beta^\mathsf{T} Z(t)\bigr\}\,\mathrm{d}\mu_0(t). \]

  • Estimation: Identical point estimates as Andersen–Gill, but with robust (sandwich) standard errors to account for unmodeled within-subject correlation.
  • Interpretation: \(\beta\) is the population-level effect on the rate (or mean) of recurrent events, simpler to interpret when focusing on an overall treatment effect.

Multivariate approaches to total or gap times

Rather than modeling the entire counting process \(N^*(t)\) directly, one can treat the times \(T_1< T_2< \cdots\) (or gap times \(U_k=T_k - T_{k-1}\)) as a multivariate survival problem:

  • WLW model: Marginal hazard for each event order (1st, 2nd, etc.), ignoring the conditioning on previous events.
  • PWP-TT: Condition on having experienced the \((k-1)\)th event, analyzing total times but only for those who reached event \(k-1\).
  • PWP-GT: Condition similarly, but use gap times from one event to the next, resetting the time clock after each event.

Differing risk-set definitions imply differences in interpretation for later events. WLW models the overall population-level hazard for the \(k\)th event, while PWP approaches focus on conditional hazards for those who already had \((k-1)\) events.

Example R code

Below are code snippets using R’s survival::coxph() to demonstrate fitting the three main classes of Cox-type recurrent event models. Suppose our data frame df is in a “counting process” format with columns:

  • start, stop: time interval boundaries for each event segment,
  • status: event indicator (1 if the event occurred in that interval, 0 if censored),
  • id: subject identifier,
  • additional covariates (e.g., x1, x2).
library(survival)
########################
# 1. Andersen–Gill model
########################
# Assumes any correlation among events is captured
# entirely by the covariates in the model.
fit_AG <- coxph(
  Surv(start, stop, status) ~ x1 + x2, 
  data = df
)

####################################
# 2. Frailty model (Gamma frailty)
####################################
# Introduces a subject-specific random effect
fit_frail <- coxph(
  Surv(start, stop, status) ~ x1 + x2
    + frailty(id, distribution = "gamma"),
  data = df
)

############################################
# 3. Proportional rates/means (LWYY model)
############################################
# Uses robust variance to account for correlation;
# point estimates match Andersen–Gill, but standard errors differ.
fit_LWYY <- coxph(
  Surv(start, stop, status) ~ x1 + x2
    + cluster(id),
  data = df
)

For multivariate approaches, we need to preprocess the data to include event order or gap times:

  • order: event order (1st, 2nd, etc.),
  • gtime: gap time from the previous event.
##################################
# WLW model
##################################
# 'stop' is the time to the kth event (or censor),
# ignoring whether the (k-1)th event occurred.
fit_wlw <- coxph(
  Surv(stop, status) ~ x1 + x2 * strata(order)
    + cluster(id),
  data = df
)

##################################
# PWP-TT model
##################################
# 'start' is the time of (k-1)th event; 'stop' is kth event time.
fit_pwp_tt <- coxph(
  Surv(start, stop, status) ~ x1 + x2 * strata(order)
    + cluster(id),
  data = df
)

##################################
# PWP-GT model
##################################
# 'gtime' is the gap time from (k-1)th event to kth event.
# Otherwise similar to WLW in usage.
fit_pwp_gt <- coxph(
  Surv(gtime, status) ~ x1 + x2 * strata(order)
    + cluster(id),
  data = df
)

Conclusion

Recurrent event data demand methods that move beyond standard univariate survival analysis:

  • Andersen–Gill and frailty models specify conditional intensities, capturing event dependence through time-varying covariates or latent frailties.
  • Proportional rates/means approaches target marginal rates, using robust variances to adjust for within-subject correlations.
  • Multivariate total/gap-time approaches view each recurrence as a separate component, applying extensions of the Cox model (WLW, PWP-TT, PWP-GT), each with different conditioning and interpretational nuances.

The choice among these frameworks depends on scientific aims (e.g., dynamic prediction vs. inference on treatment effect at baseline) and how one wishes to handle correlations induced by repeated occurrences. All, however, build on the Cox model’s core multiplicative form, ensuring familiar regression interpretations and flexible software implementations.

R Code

Show the code
###############################################################################
# Chapter 9 R Code
#
# This script reproduces major numerical results in Chapter 9, including
#   1. Andersen--Gill model for recurrent events (CGD study)
#   2. Frailty model vs proportional means model (CGD study)
#   3. Multivariate approaches (WLW, PWP models)
#   4. Poisson, negative binomial, and zero-inflated Poisson regressions
###############################################################################

library(survival)   # Cox-type recurrent-event models and diagnostics
library(tidyverse)  # Data manipulation and ggplot2-based graphics
library(gtsummary)  # Formatted regression tables
library(knitr)      # Table printing utilities
library(patchwork)  # Plot combination
library(pscl)       # Zero-inflated Poisson regression

# =============================================================================
# (A) Andersen--Gill / Frailty / LWYY Models for Recurrent Infections
# =============================================================================

# -----------------------------------------------------------------------------
# 1. Read the Chronic Granulomatous Disease (CGD) data in counting-process form
# -----------------------------------------------------------------------------
cgd <- read.table("Data\\Chronic Granulomatous Disease Study\\cgd_counting.txt")
head(cgd)

# -----------------------------------------------------------------------------
# 2. Andersen--Gill model
# -----------------------------------------------------------------------------
# Surv(tstart, tstop, status) specifies recurrent-event counting-process data
obj_AG <- coxph(
  Surv(tstart, tstop, status) ~ treat + sex + age + inherit + steroids + propylac,
  data = cgd
)

summary(obj_AG)

# Proportional hazards check and subject-level martingale residuals
cox.zph(obj_AG)
residuals(obj_AG, type = "martingale", collapse = cgd$id)

# -----------------------------------------------------------------------------
# 3. Shared-frailty model
# -----------------------------------------------------------------------------
# frailty(id, distribution = "gamma") introduces a subject-specific random effect
obj_frail <- coxph(
  Surv(tstart, tstop, status) ~ treat + sex + age + inherit + steroids + propylac +
    frailty(id, distribution = "gamma"),
  data = cgd
)

summary(obj_frail)

# -----------------------------------------------------------------------------
# 4. Proportional means model (LWYY)
# -----------------------------------------------------------------------------
# cluster(id) keeps the Cox mean structure but uses robust variance
obj_pm <- coxph(
  Surv(tstart, tstop, status) ~ treat + sex + age + inherit + steroids + propylac +
    cluster(id),
  data = cgd
)

summary(obj_pm)

# -----------------------------------------------------------------------------
# 5. Extract coefficients, SEs, and p-values across models
# -----------------------------------------------------------------------------
coeff_AG    <- summary(obj_AG)$coeff
coeff_frail <- summary(obj_frail)$coeff
coeff_pm    <- summary(obj_pm)$coeff

# Andersen--Gill model: beta, model-based SE, p-value
c1 <- coeff_AG[, 1]
c2 <- coeff_AG[, 3]
c3 <- coeff_AG[, 5]

# Frailty model: first 6 rows correspond to regression coefficients
c4 <- coeff_frail[1:6, 1]
c5 <- coeff_frail[1:6, 2]
c6 <- coeff_frail[1:6, 6]

# Proportional means model: beta, robust SE, robust p-value
c7 <- coeff_pm[1:6, 1]
c8 <- coeff_pm[1:6, 4]
c9 <- coeff_pm[1:6, 6]

# Side-by-side comparison across the three models
noquote(round(cbind(c1, c2, c3, c4, c5, c6, c7, c8, c9), 3))

# =============================================================================
# (B) Predicted Mean Functions by Treatment
# =============================================================================

# -----------------------------------------------------------------------------
# 1. Extract beta and baseline mean function from the proportional means model
# -----------------------------------------------------------------------------
beta <- obj_pm$coefficients
Lt   <- basehaz(obj_pm, centered = FALSE)
t    <- Lt$time
mu0  <- Lt$hazard  # baseline mean function mu0(t)

# -----------------------------------------------------------------------------
# 2. Define covariate profiles
# -----------------------------------------------------------------------------
# Profiles use the covariates after treatment in the model:
# sex, age, inherit, steroids, propylac

# Female profile
zf <- c(0, 12, 1, 1, 1)

# Male profile
zm <- c(1, 12, 1, 1, 1)

# -----------------------------------------------------------------------------
# 3. Construct fitted mean functions for treatment and control
# -----------------------------------------------------------------------------
# Female: treatment vs control
mu.f.trt   <- exp(sum(c(1, zf) * beta)) * mu0
mu.f.contr <- exp(sum(c(0, zf) * beta)) * mu0

# Male: treatment vs control
mu.m.trt   <- exp(sum(c(1, zm) * beta)) * mu0
mu.m.contr <- exp(sum(c(0, zm) * beta)) * mu0

# -----------------------------------------------------------------------------
# 4. Plot fitted mean number of infections
# -----------------------------------------------------------------------------
par(mfrow = c(1, 2))

# Female
plot(
  t / 30.5, mu.f.trt,
  type       = "s",
  xlim       = c(0, 12),
  ylim       = c(0, 6),
  frame.plot = FALSE,
  lty        = 1,
  main       = "Female",
  xlab       = "Time (months)",
  ylab       = "Mean number of infections",
  lwd        = 2
)
lines(t / 30.5, mu.f.contr, lty = 3, lwd = 2)

# Male
plot(
  t / 30.5, mu.m.trt,
  type       = "s",
  xlim       = c(0, 12),
  ylim       = c(0, 6),
  frame.plot = FALSE,
  lty        = 1,
  main       = "Male",
  xlab       = "Time (months)",
  ylab       = "Mean number of infections",
  lwd        = 2
)
lines(t / 30.5, mu.m.contr, lty = 3, lwd = 2)

# =============================================================================
# (C) Multivariate Approaches (WLW, PWP Models)
# =============================================================================

# -----------------------------------------------------------------------------
# 1. Create event-order and gap-time variables
# -----------------------------------------------------------------------------
cgd_mul <- cgd %>%
  arrange(id, tstart) %>%
  group_by(id) %>%
  mutate(
    order = row_number(),     # event order within subject
    gtime = tstop - tstart,   # gap time since previous event or study entry
    .after = tstop
  ) %>%
  ungroup()

# -----------------------------------------------------------------------------
# 2. WLW model
# -----------------------------------------------------------------------------
# Uses total time and treats repeated event orders as separate margins
wlw_obj <- coxph(
  Surv(tstop, status) ~ (treat + sex + age) * strata(order),
  data = cgd_mul %>% filter(order <= 3)
)

# Reduced model with common effects across orders
wlw_obj0 <- coxph(
  Surv(tstop, status) ~ (treat + sex + age) + strata(order),
  data = cgd_mul
)

# -----------------------------------------------------------------------------
# 3. PWP-TT model
# -----------------------------------------------------------------------------
# Total-time version: risk set for the jth event includes only subjects
# who have experienced the first j - 1 events
pwp_tt_obj <- coxph(
  Surv(tstart, tstop, status) ~ (treat + sex + age) * strata(order),
  data = cgd_mul %>% filter(order <= 3)
)

# Reduced model with common effects across event order
pwp_tt_obj0 <- coxph(
  Surv(tstart, tstop, status) ~ (treat + sex + age) + strata(order),
  data = cgd_mul
)

# -----------------------------------------------------------------------------
# 4. PWP-GT model
# -----------------------------------------------------------------------------
# Gap-time version uses time since the previous event
pwp_gt_obj <- coxph(
  Surv(gtime, status) ~ (treat + sex + age) * strata(order),
  data = cgd_mul %>% filter(order <= 3)
)

# Reduced model with common effects across event order
pwp_gt_obj0 <- coxph(
  Surv(gtime, status) ~ (treat + sex + age) + strata(order),
  data = cgd_mul
)

# -----------------------------------------------------------------------------
# 5. Extract treatment HRs, 95% CIs, and p-values by event order
# -----------------------------------------------------------------------------
trt_hr_mul <- function(obj, obj0, p, K, r = 2) {
  # p = number of baseline covariates including treatment
  # K = number of event strata displayed

  beta <- obj$coefficients
  var  <- obj$var

  beta_trt <- numeric(p + 1)
  var_trt  <- numeric(p + 1)

  # First event: treatment main effect
  beta_trt[1] <- beta[1]
  var_trt[1]  <- var[1, 1]

  # Events 2,...,K: main effect + interaction with order-specific stratum
  for (j in 2:K) {
    beta_trt[j] <- beta[1] + beta[p + j - 1]
    var_trt[j]  <- var[1, 1] +
      var[p + j - 1, p + j - 1] +
      2 * var[1, p + j - 1]
  }

  # Overall treatment effect from the reduced model
  beta_trt[p + 1] <- obj0$coefficients[1]
  var_trt[p + 1]  <- obj0$var[1, 1]

  se_trt <- sqrt(var_trt)
  za     <- qnorm(0.975)

  tibble(
    HR = str_c(
      round(exp(beta_trt), r), " (",
      round(exp(beta_trt - za * se_trt), r), ", ",
      round(exp(beta_trt + za * se_trt), r), ")"
    ),
    P = round(1 - pchisq((beta_trt / se_trt)^2, df = 1), 3)
  )
}

# -----------------------------------------------------------------------------
# 6. Build summary table across WLW, PWP-TT, and PWP-GT
# -----------------------------------------------------------------------------
# Group sizes by treatment arm
N <- cgd_mul %>%
  filter(order == 1) %>%
  count(treat) %>%
  pull(n)

# Event counts by order and treatment arm
row_header <- cgd_mul %>%
  filter(status == 1) %>%
  count(treat, order) %>%
  filter(order <= 3) %>%
  pivot_wider(values_from = n, names_from = treat) %>%
  mutate(order = as.character(order)) %>%
  add_row(
    order    = "All",
    placebo  = cgd_mul %>% filter(treat == "placebo", status == 1) %>% count() %>% pull(),
    `rIFN-g` = cgd_mul %>% filter(treat == "rIFN-g", status == 1) %>% count() %>% pull()
  ) %>%
  mutate(
    placebo  = str_c(placebo, " (", round(100 * placebo / N[1], 1), "%)"),
    `rIFN-g` = str_c(`rIFN-g`, " (", round(100 * `rIFN-g` / N[2], 1), "%)")
  ) %>%
  select(order, `rIFN-g`, placebo)

# Treatment HR columns for the three models
wlw_HR   <- trt_hr_mul(wlw_obj,    wlw_obj0,    3, 3)$HR
pwpTT_HR <- trt_hr_mul(pwp_tt_obj, pwp_tt_obj0, 3, 3)$HR
pwpGT_HR <- trt_hr_mul(pwp_gt_obj, pwp_gt_obj0, 3, 3)$HR

# Combine into a summary table
cgd_tab <- row_header %>%
  add_column(
    WLW      = wlw_HR,
    `PWP-TT` = pwpTT_HR,
    `PWP-GT` = pwpGT_HR
  )

cgd_tab

# Optional LaTeX table export
# cgd_tab %>%
#   kable("latex")

# -----------------------------------------------------------------------------
# 7. Forest plot of treatment hazard ratios
# -----------------------------------------------------------------------------
# Parse strings of the form "x (xmin, xmax)"
hr_ci_parse <- function(x) {
  separate_wider_regex(
    x,
    patterns = c(x = ".+", " \\(", xmin = ".+", ", ", xmax = ".+", "\\)")
  )
}

# Separate HR strings for each model into point estimate and CI bounds
cgd_models <- cgd_tab %>%
  separate_wider_regex(
    WLW:`PWP-GT`,
    patterns  = c(x = ".+", " \\(", xmin = ".+", ", ", xmax = ".+", "\\)"),
    names_sep = "_"
  ) %>%
  mutate(across(!c(order:placebo), as.numeric)) %>%
  pivot_longer(
    !c(order:placebo),
    names_to = c("Model", ".value"),
    names_sep = "_"
  )

# Plot event-specific and overall treatment HRs
cgd_models %>%
  ggplot(aes(x = x, y = order)) +
  geom_point(aes(size = (order == "All"))) +
  geom_linerange(aes(xmin = xmin, xmax = xmax, linewidth = (order == "All"))) +
  geom_vline(xintercept = 1, linetype = 2) +
  facet_wrap(~ fct(Model), ncol = 3) +
  scale_y_discrete(
    "Infection",
    limits = rev(c("1", "2", "3", "All")),
    labels = rev(c("1st", "2nd", "3rd", "All"))
  ) +
  scale_x_log10("Treatment hazard ratio (95% CI)") +
  scale_size_manual(values = c(1.5, 2)) +
  scale_linewidth_manual(values = c(0.5, 0.8)) +
  theme_minimal() +
  theme(
    axis.text       = element_text(size = 10),
    axis.title      = element_text(size = 11),
    panel.grid      = element_blank(),
    strip.text      = element_text(size = 11),
    legend.position = "none"
  )

# Optional saving
# ggsave("rec_cgd_forest.pdf", width = 8, height = 4.5)
# ggsave("rec_cgd_forest.eps", width = 8, height = 4.5)

# =============================================================================
# (D) Poisson / Negative Binomial / Zero-Inflated Poisson Regressions
# =============================================================================

# -----------------------------------------------------------------------------
# 1. Collapse data to one row per subject
# -----------------------------------------------------------------------------
df_events <- cgd %>%
  group_by(id) %>%
  mutate(
    count     = sum(status),  # total number of recurrent events
    follow_up = max(tstop)    # total observed follow-up time
  ) %>%
  distinct(id, .keep_all = TRUE)

# -----------------------------------------------------------------------------
# 2. Poisson regression
# -----------------------------------------------------------------------------
# offset(log(follow_up)) models the event rate per unit follow-up time
ps_obj <- glm(
  count ~ treat + age + sex + offset(log(follow_up)),
  family = poisson(link = "log"),
  data = df_events
)

summary(ps_obj)

# -----------------------------------------------------------------------------
# 3. Negative binomial regression
# -----------------------------------------------------------------------------
# glm.nb() relaxes the Poisson variance assumption
nb_obj <- MASS::glm.nb(
  count ~ treat + age + sex + offset(log(follow_up)),
  data = df_events
)

summary(nb_obj)

# -----------------------------------------------------------------------------
# 4. Zero-inflated Poisson regression
# -----------------------------------------------------------------------------
# First formula part: Poisson mean model for counts
# Second formula part: logistic model for excess zeros
zip_model <- zeroinfl(
  count ~ treat + age + sex + offset(log(follow_up))
    | treat + age + sex,
  data = df_events
)

summary(zip_model)

# Extract coefficient vectors and variance-covariance matrix
zip_model$coefficients
zip_model$vcov