###############################
# 1. RMST (two-sample/regression)
###############################
library(survRM2)
<- rmst2(time = df$time, status = df$status, arm = df$arm, tau = 5)
obj_rmst summary(obj_rmst) # RMST difference, ratio, etc.
plot(obj_rmst) # KM curves with RMST shading
# Regression (IPCW-based) for RMST
<- rmst2(time = df$time, status = df$status,
obj_rmst_reg arm = df$arm,
covariates = df[, c("x1", "x2")],
tau = 5)
$RMST.difference.adjusted # Additive model
obj_rmst_reg$RMST.ratio.adjusted # Multiplicative model
obj_rmst_reg
###############################
# 2. Additive hazards
###############################
library(addhazard)
<- ah(Surv(time, status) ~ x1 + x2, data = df, ties = FALSE)
obj_ah summary(obj_ah) # Semiparametric additive hazards
library(timereg)
<- aalen(Surv(time, status) ~ x1 + x2, data = df)
obj_aalen summary(obj_aalen) # Aalen’s nonparametric version
plot(obj_aalen)
# Cox-Aalen hybrid
<- cox.aalen(Surv(time, status) ~ prop(x1) + x2, data = df)
obj_coxaalen summary(obj_coxaalen)
plot(obj_coxaalen) # Time-varying additive effect for x2
###############################
# 3. Proportional odds
###############################
<- prop.odds(Event(time, status) ~ x1 + x2, data = df)
obj_po summary(obj_po)
# Baseline cumulative odds stored in obj_po$cum
###############################
# 4. AFT models
###############################
library(survival)
# Parametric
<- survreg(Surv(time, status) ~ x1 + x2,
obj_weibull data = df, dist = "weibull")
summary(obj_weibull)
# Semiparametric (rank-based or weighted least squares)
library(aftgee)
<- aftgee(Surv(time, status) ~ x1 + x2,
obj_aft_sp data = df, B=200)
summary(obj_aft_sp)
Chapter 5 - Other Non- and Semi-parametric Methods
Slides
Lecture slides here. (To convert html to pdf, press E \(\to\) Print \(\to\) Destination: Save to pdf)
Chapter Summary
When the proportional hazards assumption is questionable or when alternative measures of treatment effect are desired, a variety of non- and semi-parametric approaches can be employed. Restricted mean survival time (RMST) focuses on time lived within a specified window, while additive hazards models quantify absolute rather than relative risk. The proportional odds model captures time-invariant covariate effects on the odds of failing early, and accelerated failure time (AFT) models directly relate log-time to covariates, often offering more intuitive interpretations of “time gained or lost.”
Restricted mean survival time
The restricted mean survival time over \([0, \tau]\) is \[ \mu(\tau) \;=\; E\bigl(T \wedge \tau\bigr) \;=\; \int_0^\tau S(t)\,dt, \] where \(S(t)\) is the survival function. Compared to hazard ratios, RMST allows direct interpretation of how many additional event-free years (or months) are gained or lost under different conditions. For a two-sample comparison, one can compute the difference \(\mu_1(\tau)-\mu_0(\tau)\) or the ratio \(\mu_1(\tau)/\mu_0(\tau)\). Regression extends this to \[ g\{\mu(\tau \mid Z)\} = \beta^\mathrm{T} Z, \] where \(g\) could be the identity (additive on RMST) or log (multiplicative on RMST). Estimation often uses inverse probability of censoring weights (IPCW) or pseudo-observations and remains valid without a proportional hazards assumption.
Additive hazards model
Where the Cox model uses the hazard ratio, the additive hazards model posits that covariates contribute additively: \[ \lambda(t \mid Z) \;=\; \lambda_0(t) + \beta^\mathrm{T} Z. \] A coefficient \(\beta_k\) has units of “events per person-time” rather than a ratio, providing an absolute effect size. Semiparametric estimation (Lin’s method) is analogous to partial likelihood, allowing estimation of \(\beta\) and baseline \(\lambda_0(t)\). If time-varying coefficients \(\beta(t)\) are preferred, Aalen’s nonparametric additive model offers full flexibility, with the cumulative effect estimated in a stepwise manner.
Finally, a Cox–Aalen hybrid model allows some covariates to act multiplicatively on the hazard and others additively over time, bridging the two approaches.
Proportional odds model
The proportional odds model targets survival functions via cumulative odds: \[ \log\left[\frac{1-S(t\mid Z)}{S(t \mid Z)}\right] \;=\; h_0(t) \;+\; \beta^\mathrm{T} Z. \] This implies a covariate’s effect is a constant odds ratio on “early” failure. It often accommodates scenarios where hazard ratios decline over time, but the odds ratio remains more stable. Estimation proceeds via a nonparametric maximum likelihood for \(h_0(t)\) and a finite-dimensional parameter \(\beta\).
Accelerated failure time model
The accelerated failure time (AFT) framework directly models log-time: \[ \log T \;=\; \beta^\mathrm{T} Z \;+\; \varepsilon, \] so \(\exp(\beta_k)\) is a time ratio. A unit increase in \(Z_k\) multiplies median (or mean) survival time by \(\exp(\beta_k)\). Parametric versions (e.g., log-normal, Weibull) are fit via standard maximum likelihood, while semiparametric AFT estimation uses iterative least squares or rank-based approaches (e.g., Buckley–James, rank-GEE). Such methods can be numerically sensitive when censoring is heavy or the underlying distribution is heavily skewed, but they provide an appealing direct interpretation of how survival time “speeds up” or “slows down.”
Example R commands
Below is an illustration of core functions (in different packages) implementing these methods. Assume a data frame df
with columns: time
, status
, and additional covariates.
Conclusion
Each of these methods introduced can be more appealing than the Cox model when hazard ratios vary over time or when the interpretation of “time gained” or “time ratio” is more intuitive for clinical or policy purposes. By integrating these options, practitioners have a broader and more robust toolkit for analyzing survival data across a wide range of scenarios.
R Code
Show the code
###############################################################################
# Chapter 5 R Code
#
# This script reproduces all major numerical results in Chapter 5, including:
# 1. Restricted mean survival time (RMST) analysis (with survRM2)
# 2. Additive hazards analyses (with addhazard, timereg)
# 3. Proportional odds models (with timereg)
# 4. Accelerated failure time (AFT) models (with aftgee, survival)
# 5. Pseudo-observation approach to RMST (with pseudo)
###############################################################################
#==============================================================================
# (A) GBC Data Setup
#==============================================================================
library(survival)
library(tidyverse)
#------------------------------------------------------------------------------
# 1. Read and Prepare GBC Data
#------------------------------------------------------------------------------
<- read.table("Data//German Breast Cancer Study//gbc.txt")
gbc
# Sort the data by time within each id
<- order(gbc$id, gbc$time)
o <- gbc[o, ]
gbc
# Keep the first row for each id (first event)
<- gbc[!duplicated(gbc$id), ]
df
# Set status = 1 if status == 2 or 1
$status <- (df$status > 0) + 0
df
# Reduce the scales of prog and estrg by 100
$prog <- df$prog / 100
df$estrg <- df$estrg / 100
df
#==============================================================================
# (B) Restricted Mean Survival Time (RMST) Analysis
# Using the "survRM2" Package
#==============================================================================
# install.packages("survRM2")
library(survRM2)
#------------------------------------------------------------------------------
# 1. Two-Sample Comparison (Hormonal vs. Non-Hormonal) on 5-Year RMST
#------------------------------------------------------------------------------
<- rmst2(
obj time = df$time / 12, # convert months to years
status = df$status,
arm = df$hormone - 1, # hormone: {1, 2} => arm: {0, 1}
tau = 5
)
# Print results
obj# More compact results
$unadjusted.result
obj
# Graphical display of group-specific RMST
plot(
obj,xlab = "Time (years)",
ylab = "Relapse-free survival",
col.RMST = "gray",
col.RMTL = "white",
col = "black",
cex.lab = 1.2,
cex.axis = 1.2,
xlim = c(0, 5)
)
#------------------------------------------------------------------------------
# 2. Regression Adjustment for Additional Covariates
#------------------------------------------------------------------------------
<- rmst2(
obj_reg time = df$time / 12,
status = df$status,
arm = df$hormone - 1,
covariates = df[, 5:11], # columns 5-11 in df
tau = 5
)
# Print overall results
obj_reg
# Additive model on RMST
round(obj_reg$RMST.difference.adjusted, 3)
# Multiplicative model on RMST
round(obj_reg$RMST.ratio.adjusted, 3)
# Multiplicative model on RMTL
round(obj_reg$RMTL.ratio.adjusted, 3)
# For hormone treatment only
# (May require checking if the object contains this component)
$adjusted.result
obj_reg
#==============================================================================
# (C) Additive Hazards Analysis
#==============================================================================
# install.packages("addhazard")
library(addhazard)
#------------------------------------------------------------------------------
# 1. Fitting an Additive Hazards Model (ah) from addhazard
#------------------------------------------------------------------------------
# Convert hormone/meno to factors
$hormone <- factor(df$hormone)
df$meno <- factor(df$meno)
df
# Add a tiny random component to break ties
set.seed(123)
$time.dties <- df$time / 12 + runif(nrow(df), 0, 1e-12)
df
# Fit additive hazards model
<- ah(
obj_ah Surv(time.dties, status) ~ hormone + meno + age + size + grade + nodes + prog + estrg,
data = df,
ties = FALSE
)
summary(obj_ah)
<- obj_ah$coef # estimated regression coefficients
beta #------------------------------------------------------------------------------
# 2. Aalen's Nonparametric Model (aalen) from timereg
#------------------------------------------------------------------------------
# install.packages("timereg")
library(timereg)
<- aalen(
obj_aalen Surv(time / 12, status) ~ hormone + meno + age + size + grade + nodes + prog + estrg,
data = df
)
summary(obj_aalen)
<- obj_aalen$cum # cumulative regression function estimates
Bt
par(mfrow = c(5, 2))
plot(obj_aalen)
#------------------------------------------------------------------------------
# 3. Plotting the Estimated Cumulative Regression Coefficients
#------------------------------------------------------------------------------
<- as_tibble(Bt) %>%
aalen_beta select(-`(Intercept)`) %>% # remove intercept
pivot_longer(
cols = -time,
names_to = "covariate",
values_to = "beta"
%>%
) mutate(covariate = fct_inorder(covariate))
# (Optional) If you intended a semiparametric overlay,
# you would need a 'beta' object. This is a placeholder:
<- tibble(
ly_beta time = Bt[, "time"],
# example usage if 'beta' were a coefficient vector or matrix
as_tibble(Bt[, "time"] %*% t(beta))
%>%
) pivot_longer(
cols = -time,
names_to = "covariate",
values_to = "beta"
%>%
) mutate(covariate = fct_inorder(covariate))
<- c(
var_labeller "hormone2" = "Hormone therapy (yes v. no)",
"meno2" = "Menopausal (yes v. no)",
"age" = "Age (years)",
"size" = "Tumor size (mm)",
"grade" = "Grade (1-3)",
"nodes" = "Nodes",
"prog" = "Progesterone (100 fmol/mg)",
"estrg" = "Estrogen (100 fmol/mg)"
)
%>%
aalen_beta ggplot(aes(x = time, y = beta)) +
geom_step() +
# If you had a semiparametric line, you'd add:
geom_line(data = ly_beta, aes(x = time, y = beta),
color = "#C5050C", linetype = 2, linewidth = 1) +
scale_x_continuous(
"Time (years)",
limits = c(0, 6),
breaks = seq(0, 6, by = 1)
+
) labs(y = expression("B(t) (year"^{-1} * ")")) +
facet_wrap(
~covariate,
scales = "free_y",
ncol = 2,
labeller = labeller(covariate = var_labeller)
+
) theme_minimal()
ggsave("images/nonhaz_aalen_beta.png", width = 8, height = 8)
ggsave("images/nonhaz_aalen_beta.eps", width = 8, height = 8)
#------------------------------------------------------------------------------
# 4. Cox-Aalen Model
#------------------------------------------------------------------------------
<- cox.aalen(
obj_cox_aalen Surv(time / 12, status) ~ prop(hormone) + meno + age + size + grade + nodes + prog + estrg,
data = df
)
summary(obj_cox_aalen)
plot(obj_cox_aalen)
$gamma
obj_cox_aalen$var.gamma
obj_cox_aalen
#==============================================================================
# (D) Proportional Odds Analysis
#==============================================================================
# install.packages("timereg")
library(timereg)
<- prop.odds(
obj_po Event(time, status) ~ hormone + meno + age + size + grade + nodes + prog + estrg,
data = df
)
summary(obj_po)
= obj_po$cum[, 1]
t = obj_po$cum[, 2]
base_odds
#------------------------------------------------------------------------------
# 1. Plot Baseline Cumulative Odds (Base R)
#------------------------------------------------------------------------------
par(mfrow = c(1, 1))
plot(
stepfun(t, c(0, base_odds)),
do.points = FALSE,
lwd = 2,
xlim = c(0, 80),
ylim = c(0, 1.4),
frame.plot = FALSE,
xlab = "Time (months)",
ylab = "Baseline cumulative odds",
main = ""
)
#------------------------------------------------------------------------------
# 2. Plot Baseline Cumulative Odds (ggplot)
#------------------------------------------------------------------------------
tibble(
time = t,
odds = base_odds
%>%
) ggplot(aes(x = time / 12, y = odds)) +
geom_step() +
scale_x_continuous(
"Time (years)",
limits = c(0, 6),
breaks = 1:6
+
) scale_y_continuous("Baseline cumulative odds", limits = c(0, 1.25)) +
theme_minimal()
ggsave("images/nonhaz_po_base.png", width = 6, height = 4)
ggsave("images/nonhaz_po_base.eps", width = 6, height = 4)
#==============================================================================
# (E) Accelerated Failure Time (AFT) Analysis
#==============================================================================
# install.packages("aftgee")
library(aftgee)
#------------------------------------------------------------------------------
# 1. Semiparametric AFT Model (aftgee)
#------------------------------------------------------------------------------
set.seed(123) # SE is based on resampling
<- aftgee(
obj_aftgee Surv(time, status) ~ hormone + meno + age + size + grade + nodes + prog + estrg,
data = df,
B = 500
)
summary(obj_aftgee)
# Acceleration factors exp(beta)
exp(obj_aftgee$coef.res)
# Variance of the log-acceleration factors
$var.res
obj_aftgee
#------------------------------------------------------------------------------
# 2. Parametric AFT Model (survreg)
#------------------------------------------------------------------------------
<- survreg(
obj_aft Surv(time, status) ~ hormone + meno + age + size + grade + nodes + prog + estrg,
data = df,
dist = "loglogistic"
)
summary(obj_aft)
$coefficients
obj_aft$var
obj_aft$scale
obj_aft
#==============================================================================
# (F) Pseudo-Observations Approach to RMST
#==============================================================================
# install.packages("pseudo")
library(pseudo)
# Example usage on df (instead of lung)
<- 5
tau <- pseudomean(df$time / 12, df$status, tmax = tau)
pseudo_rmst
# Fit a regression model using pseudo-observations
library(geepack)
<- geese(
fit_pseudo ~ hormone + meno + age + size + grade + nodes + prog + estrg,
pseudo_rmst data = df,
mean.link = "identity"
)# Alternatively: mean.link = "log"
summary(fit_pseudo)