library(survival)
########################
# 1. Andersen–Gill model
########################
# Assumes any correlation among events is captured
# entirely by the covariates in the model.
<- coxph(
fit_AG Surv(start, stop, status) ~ x1 + x2,
data = df
)
####################################
# 2. Frailty model (Gamma frailty)
####################################
# Introduces a subject-specific random effect
<- coxph(
fit_frail Surv(start, stop, status) ~ x1 + x2
+ frailty(id, distribution = "gamma"),
data = df
)
############################################
# 3. Proportional rates/means (LWYY model)
############################################
# Uses robust variance to account for correlation;
# point estimates match Andersen–Gill, but standard errors differ.
<- coxph(
fit_LWYY Surv(start, stop, status) ~ x1 + x2
+ cluster(id),
data = df
)
Chapter 9 - Recurrent Events
Slides
Lecture slides here. (To convert html to pdf, press E \(\to\) Print \(\to\) Destination: Save to pdf)
Chapter Summary
When a subject (or system) can experience recurrent events (e.g., repeated hospitalizations, infections, or mechanical failures), standard univariate approaches (like focusing only on the first event) can be inefficient and miss valuable information. Instead, we can model these repeated occurrences via extended Cox-type methods.
Intensity vs. rate functions
- Intensity function \(\ell\{t \mid \overline{N}^*(t-)\}\): The instantaneous incidence of a new event at time \(t\) conditional on the entire past. Generalizes the hazard function from univariate survival.
- Rate function \(r(t)\): The marginal incidence rate across the population. Bypasses specifying how past events affect current risk but does not fully specify the likelihood. Often paired with robust variance methods.
Poisson processes are the simplest example (memoryless arrivals), but in practice, recurrent events often cluster, so more realistic models are needed.
Multiplicative intensity/rate models
Andersen–Gill model
A direct extension of the Cox model that posits: \[ \ell\{t \mid \overline{N}^*(t-), \overline{Z}(t)\} \;=\; \exp\bigl\{\beta^\mathsf{T}Z(t)\bigr\}\,\mathrm{d}\mu_0(t). \]
- Interpretation: The conditional risk of a new event depends only on the current covariates \(Z(t)\), not the event history (unless included in \(Z(t)\)).
- Estimation: Uses partial-likelihood scores analogous to Cox, with counting-process style risk sets.
- Limitation: If only baseline covariates are included, ignoring event history can be unrealistic; including post-baseline events or other time-varying covariates can confound inference about baseline effects.
Multiplicative intensity frailty models
Incorporate a random effect (frailty) \(\xi\) to induce within-subject dependence: \[ \ell\{t \mid \xi, \overline{N}^*(t-), \overline{Z}(t)\} \;=\; \xi \,\exp\bigl\{\beta^\mathsf{T} Z(t)\bigr\}\,\mathrm{d}\mu_0(t). \]
- Example: Gamma frailty, letting \(\xi\) follow a Gamma distribution with mean 1, variance \(1/\gamma\).
- Interpretation: Covariate effects are subject-specific; \(\beta\) reflects hazard ratios within each random-effect stratum.
- Estimation: Typically via EM algorithm.
Proportional rates/means (LWYY) model
Targets the marginal rate: \[ r\{t \mid \overline{Z}(t)\} \;=\; \exp\bigl\{\beta^\mathsf{T} Z(t)\bigr\}\,\mathrm{d}\mu_0(t). \]
- Estimation: Identical point estimates as Andersen–Gill, but with robust (sandwich) standard errors to account for unmodeled within-subject correlation.
- Interpretation: \(\beta\) is the population-level effect on the rate (or mean) of recurrent events, simpler to interpret when focusing on an overall treatment effect.
Multivariate approaches to total or gap times
Rather than modeling the entire counting process \(N^*(t)\) directly, one can treat the times \(T_1< T_2< \cdots\) (or gap times \(U_k=T_k - T_{k-1}\)) as a multivariate survival problem:
- WLW model: Marginal hazard for each event order (1st, 2nd, etc.), ignoring the conditioning on previous events.
- PWP-TT: Condition on having experienced the \((k-1)\)th event, analyzing total times but only for those who reached event \(k-1\).
- PWP-GT: Condition similarly, but use gap times from one event to the next, resetting the time clock after each event.
Differing risk-set definitions imply differences in interpretation for later events. WLW models the overall population-level hazard for the \(k\)th event, while PWP approaches focus on conditional hazards for those who already had \((k-1)\) events.
Example R code
Below are code snippets using R’s survival::coxph()
to demonstrate fitting the three main classes of Cox-type recurrent event models. Suppose our data frame df
is in a “counting process” format with columns:
start
,stop
: time interval boundaries for each event segment,
status
: event indicator (1 if the event occurred in that interval, 0 if censored),
id
: subject identifier,
- additional covariates (e.g.,
x1
,x2
).
For multivariate approaches, we need to preprocess the data to include event order or gap times:
order
: event order (1st, 2nd, etc.),gtime
: gap time from the previous event.
##################################
# WLW model
##################################
# 'stop' is the time to the kth event (or censor),
# ignoring whether the (k-1)th event occurred.
<- coxph(
fit_wlw Surv(stop, status) ~ x1 + x2 * strata(order)
+ cluster(id),
data = df
)
##################################
# PWP-TT model
##################################
# 'start' is the time of (k-1)th event; 'stop' is kth event time.
<- coxph(
fit_pwp_tt Surv(start, stop, status) ~ x1 + x2 * strata(order)
+ cluster(id),
data = df
)
##################################
# PWP-GT model
##################################
# 'gtime' is the gap time from (k-1)th event to kth event.
# Otherwise similar to WLW in usage.
<- coxph(
fit_pwp_gt Surv(gtime, status) ~ x1 + x2 * strata(order)
+ cluster(id),
data = df
)
Conclusion
Recurrent event data demand methods that move beyond standard univariate survival analysis:
- Andersen–Gill and frailty models specify conditional intensities, capturing event dependence through time-varying covariates or latent frailties.
- Proportional rates/means approaches target marginal rates, using robust variances to adjust for within-subject correlations.
- Multivariate total/gap-time approaches view each recurrence as a separate component, applying extensions of the Cox model (WLW, PWP-TT, PWP-GT), each with different conditioning and interpretational nuances.
The choice among these frameworks depends on scientific aims (e.g., dynamic prediction vs. inference on treatment effect at baseline) and how one wishes to handle correlations induced by repeated occurrences. All, however, build on the Cox model’s core multiplicative form, ensuring familiar regression interpretations and flexible software implementations.
R Code
Show the code
###############################################################################
# Chapter 9 R Code
#
# This script reproduces all major numerical results in Chapter 9, including:
# 1. Andersen-Gill Model for recurrent events (Chronic Granulomatous Disease)
# 2. Frailty model vs. Proportional Means model (Chronic Granulomatous Disease)
# 3. Multivariate approaches (WLW, PWP models)
# 4. Poisson, Negative Binomial, and Zero-Inflated Poisson regressions
###############################################################################
library(survival)
library(tidyverse)
library(gtsummary)
library(knitr)
library(patchwork)
#==============================================================================
# (A) Andersen-Gill/Frailty/LWYY Model for Recurrent Infections (CGD Data)
#==============================================================================
#------------------------------------------------------------------------------
# 1. Read the Chronic Granulomatous Disease (CGD) Data
# in Counting Process Format
#------------------------------------------------------------------------------
<- read.table("Data\\Chronic Granulomatous Disease Study\\cgd_counting.txt")
cgd head(cgd)
#------------------------------------------------------------------------------
# 2. Andersen-Gill Model
#------------------------------------------------------------------------------
<- coxph(
obj_AG Surv(tstart, tstop, status) ~ treat + sex + age + inherit + steroids + propylac,
data = cgd
)
summary(obj_AG)
# Residuals
cox.zph(obj_AG)
residuals(obj_AG, type = "martingale", collapse = cgd$id)
#------------------------------------------------------------------------------
# 3. Frailty Model
#------------------------------------------------------------------------------
<- coxph(
obj_frail Surv(tstart, tstop, status) ~ treat + sex + age + inherit + steroids + propylac +
frailty(id, distribution = "gamma"),
data = cgd
)summary(obj_frail)
#------------------------------------------------------------------------------
# 4. Proportional Means Model (LWYY)
#------------------------------------------------------------------------------
<- coxph(
obj_pm Surv(tstart, tstop, status) ~ treat + sex + age + inherit + steroids + propylac +
cluster(id),
data = cgd
)summary(obj_pm)
#------------------------------------------------------------------------------
# 5. Extract Coefficients for Table 9.2
#------------------------------------------------------------------------------
<- summary(obj_AG)$coeff
coeff_AG <- summary(obj_frail)$coeff
coeff_frail <- summary(obj_pm)$coeff
coeff_pm
# Andersen-Gill
<- coeff_AG[, 1] # beta
c1 <- coeff_AG[, 3] # se(beta)
c2 <- coeff_AG[, 5] # p-value
c3
# Frailty
<- coeff_frail[1:6, 1] # beta
c4 <- coeff_frail[1:6, 2] # se(beta)
c5 <- coeff_frail[1:6, 6] # p-value
c6
# Proportional Means
<- coeff_pm[1:6, 1] # beta
c7 <- coeff_pm[1:6, 4] # se(beta)
c8 <- coeff_pm[1:6, 6] # p-value
c9
# Print Table 9.1 (beta, se, p-values)
noquote(round(cbind(c1, c2, c3, c4, c5, c6, c7, c8, c9), 3))
#==============================================================================
# (B) Figure 9.4: Predicted Mean Functions by Treatment
#==============================================================================
# Predicted mean number of infections using Proportional Means model
#------------------------------------------------------------------------------
# 1. Extract Beta and Baseline Mean Function
<- obj_pm$coefficients
beta <- basehaz(obj_pm, centered = FALSE)
Lt <- Lt$time
t <- Lt$hazard # baseline mean function mu0(t)
mu0
# 2. Covariate Profiles
# For "female" patient: (sex=0, age=12, inherit=X-linked=1, steroids=1, propylac=1)
# Actually from the code: sex=1 means female, so we can check carefully:
# code snippet suggests c(0, 12, 1, 1, 1) are the covariates besides treatment,
# so treat=0 or 1 is appended in front.
<- c(0, 12, 1, 1, 1)
zf
# For "male" patient: (sex=1, age=12, etc.)
<- c(1, 12, 1, 1, 1)
zm
# 3. Construct Mean Functions
# female: treat=1 or treat=0
<- exp(sum(c(1, zf) * beta)) * mu0
mu.f.trt <- exp(sum(c(0, zf) * beta)) * mu0
mu.f.contr
# male: treat=1 or treat=0
<- exp(sum(c(1, zm) * beta)) * mu0
mu.m.trt <- exp(sum(c(0, zm) * beta)) * mu0
mu.m.contr
# 4. Plot
par(mfrow = c(1, 2))
# Female
plot(
/ 30.5, mu.f.trt,
t type = "s",
xlim = c(0, 12),
ylim = c(0, 6),
frame.plot = FALSE,
lty = 1,
main = "Female",
xlab = "Time (months)",
ylab = "Mean number of infections",
lwd = 2
)lines(t / 30.5, mu.f.contr, lty = 3, lwd = 2)
# Male
plot(
/ 30.5, mu.m.trt,
t type = "s",
xlim = c(0, 12),
ylim = c(0, 6),
frame.plot = FALSE,
lty = 1,
main = "Male",
xlab = "Time (months)",
ylab = "Mean number of infections",
lwd = 2
)lines(t / 30.5, mu.m.contr, lty = 3, lwd = 2)
#==============================================================================
# (C) Multivariate Approaches (WLW, PWP Models)
#==============================================================================
<- cgd %>%
cgd_mul arrange(id, tstart) %>%
group_by(id) %>%
mutate(
order = row_number(),
gtime = tstop - tstart,
.after = tstop
%>%
) ungroup()
#------------------------------------------------------------------------------
# 1. WLW Model
#------------------------------------------------------------------------------
<- coxph(
wlw_obj Surv(tstop, status) ~ (treat + sex + age) * strata(order),
data = cgd_mul %>% filter(order <= 3)
)
<- coxph(
wlw_obj0 Surv(tstop, status) ~ (treat + sex + age) + strata(order),
data = cgd_mul
)
#------------------------------------------------------------------------------
# 2. PWP-TT Model
# (Time since trial entry)
#------------------------------------------------------------------------------
<- coxph(
pwp_tt_obj Surv(tstart, tstop, status) ~ (treat + sex + age) * strata(order),
data = cgd_mul %>% filter(order <= 3)
)
<- coxph(
pwp_tt_obj0 Surv(tstart, tstop, status) ~ (treat + sex + age) + strata(order),
data = cgd_mul
)
#------------------------------------------------------------------------------
# 3. PWP-GT Model
# (Gap time between events)
#------------------------------------------------------------------------------
<- coxph(
pwp_gt_obj Surv(gtime, status) ~ (treat + sex + age) * strata(order),
data = cgd_mul %>% filter(order <= 3)
)
<- coxph(
pwp_gt_obj0 Surv(gtime, status) ~ (treat + sex + age) + strata(order),
data = cgd_mul
)
#------------------------------------------------------------------------------
# 4. Tabulate Treatment Effect HR (95% CI) & p-values
#------------------------------------------------------------------------------
<- function(obj, obj0, p, K, r = 2) {
trt_hr_mul # p = # of covariates (including treatment)
# K = # of strata (or repeated events)
<- obj$coefficients
beta <- obj$var
var
<- numeric(p + 1)
beta_trt <- numeric(p + 1)
var_trt
# event #1
1] <- beta[1]
beta_trt[1] <- var[1, 1]
var_trt[
# subsequent events
for (j in 2:K) {
<- beta[1] + beta[p + j - 1]
beta_trt[j] <- var[1, 1] +
var_trt[j] + j - 1, p + j - 1] +
var[p 2 * var[1, p + j - 1]
}
# Overall
+ 1] <- obj0$coefficients[1]
beta_trt[p + 1] <- obj0$var[1, 1]
var_trt[p
<- sqrt(var_trt)
se_trt <- qnorm(0.975)
za
tibble(
HR = str_c(
round(exp(beta_trt), r), " (",
round(exp(beta_trt - za * se_trt), r), ", ",
round(exp(beta_trt + za * se_trt), r), ")"
),P = round(1 - pchisq((beta_trt / se_trt)^2, df = 1), 3)
)
}
# Example usage:
# For demonstration, we pick the WLW objects:
# trt_hr_mul(wlw_obj, wlw_obj0, p = 3, K = 3)
# Collect relevant data
<- cgd_mul %>%
N filter(order == 1) %>%
count(treat) %>%
pull(n)
<- cgd_mul %>%
row_header filter(status == 1) %>%
count(treat, order) %>%
filter(order <= 3) %>%
pivot_wider(values_from = n, names_from = treat) %>%
mutate(order = as.character(order)) %>%
add_row(
order = "All",
placebo = cgd_mul %>% filter(treat == "placebo", status == 1) %>% count() %>% pull(),
`rIFN-g`= cgd_mul %>% filter(treat == "rIFN-g", status == 1) %>% count() %>% pull()
%>%
) mutate(
placebo = str_c(placebo, " (", round(100 * placebo / N[1], 1), "%)"),
`rIFN-g`= str_c(`rIFN-g`, " (", round(100 * `rIFN-g` / N[2], 1), "%)")
%>%
) select(order, `rIFN-g`, placebo)
# Build table for WLW, PWP-TT, PWP-GT
<- trt_hr_mul(wlw_obj, wlw_obj0, 3, 3)$HR
wlw_HR <- trt_hr_mul(pwp_tt_obj, pwp_tt_obj0, 3, 3)$HR
pwpTT_HR <- trt_hr_mul(pwp_gt_obj, pwp_gt_obj0, 3, 3)$HR
pwpGT_HR
# Create table
<- row_header %>%
cgd_tab add_column(
WLW = wlw_HR,
`PWP-TT` = pwpTT_HR,
`PWP-GT` = pwpGT_HR
)
# Print table
cgd_tab # Example to create a LaTeX table:
# cgd_tab %>%
# kable("latex")
#------------------------------------------------------------------------------
# 5. Forest Plot of Treatment Hazard Ratios
#------------------------------------------------------------------------------
<- function(x) {
hr_ci_parse separate_wider_regex(
x,patterns = c(x = ".+", " \\(", xmin = ".+", ", ", xmax = ".+", "\\)")
)
}
<- cgd_tab %>%
cgd_models separate_wider_regex(
:`PWP-GT`,
WLWpatterns = c(x = ".+", " \\(", xmin = ".+", ", ", xmax = ".+", "\\)"),
names_sep = "_"
%>%
) mutate(across(!c(order:placebo), as.numeric)) %>%
pivot_longer(
!c(order:placebo),
names_to = c("Model", ".value"),
names_sep = "_"
)
%>%
cgd_models ggplot(aes(x = x, y = order)) +
geom_point(aes(size = (order == "All"))) +
geom_linerange(aes(xmin = xmin, xmax = xmax, linewidth = (order == "All"))) +
geom_vline(xintercept = 1, linetype = 2) +
facet_wrap(~ fct(Model), ncol = 3) +
scale_y_discrete(
"Infection",
limits = rev(c("1", "2", "3", "All")),
labels = rev(c("1st", "2nd", "3rd", "All"))
+
) scale_x_log10("Treatment hazard ratio (95% CI)") +
scale_size_manual(values = c(1.5, 2)) +
scale_linewidth_manual(values = c(0.5, 0.8)) +
theme_minimal() +
theme(
axis.text = element_text(size = 10),
axis.title = element_text(size = 11),
panel.grid = element_blank(),
strip.text = element_text(size = 11),
legend.position = "none"
)
# ggsave("rec_cgd_forest.pdf", width = 8, height = 4.5)
# ggsave("rec_cgd_forest.eps", width = 8, height = 4.5)
#==============================================================================
# (D) Poisson / Negative Binomial / Zero-Inflated Poisson Regressions
#==============================================================================
library(MASS)
library(pscl)
#------------------------------------------------------------------------------
# 1. Flatten Data for One-Row-per-Subject
#------------------------------------------------------------------------------
<- cgd %>%
df_events group_by(id) %>%
mutate(
count = sum(status), # total # events for subject
follow_up = max(tstop) # maximum 'tstop' is censoring time
%>%
) distinct(id, .keep_all = TRUE)
#------------------------------------------------------------------------------
# 2. Poisson Regression
#------------------------------------------------------------------------------
<- glm(
ps_obj ~ treat + age + sex + offset(log(follow_up)),
count family = poisson(link = "log"),
data = df_events
)summary(ps_obj)
#------------------------------------------------------------------------------
# 3. Negative Binomial Regression
#------------------------------------------------------------------------------
<- MASS::glm.nb(
nb_obj ~ treat + age + sex + offset(log(follow_up)),
count data = df_events
)summary(nb_obj)
#------------------------------------------------------------------------------
# 4. Zero-Inflated Poisson (ZIP) Regression
#------------------------------------------------------------------------------
<- zeroinfl(
zip_model ~ treat + age + sex + offset(log(follow_up)) # Poisson part
count | treat + age + sex, # Logistic part
data = df_events
)summary(zip_model)
$coefficients
zip_model$vcov zip_model