Chapter 5 - Other Non- and Semi-parametric Methods

Slides

Lecture slides here. (To convert html to pdf, press E \(\to\) Print \(\to\) Destination: Save to pdf)

Chapter Summary

When the proportional hazards assumption is questionable or when alternative measures of treatment effect are desired, a variety of non- and semi-parametric approaches can be employed. Restricted mean survival time (RMST) focuses on time lived within a specified window, while additive hazards models quantify absolute rather than relative risk. The proportional odds model captures time-invariant covariate effects on the odds of failing early, and accelerated failure time (AFT) models directly relate log-time to covariates, often offering more intuitive interpretations of “time gained or lost.”

Restricted mean survival time

The restricted mean survival time over \([0, \tau]\) is \[ \mu(\tau) \;=\; E\bigl(T \wedge \tau\bigr) \;=\; \int_0^\tau S(t)\,dt, \] where \(S(t)\) is the survival function. Compared to hazard ratios, RMST allows direct interpretation of how many additional event-free years (or months) are gained or lost under different conditions. For a two-sample comparison, one can compute the difference \(\mu_1(\tau)-\mu_0(\tau)\) or the ratio \(\mu_1(\tau)/\mu_0(\tau)\). Regression extends this to \[ g\{\mu(\tau \mid Z)\} = \beta^\mathrm{T} Z, \] where \(g\) could be the identity (additive on RMST) or log (multiplicative on RMST). Estimation often uses inverse probability of censoring weights (IPCW) or pseudo-observations and remains valid without a proportional hazards assumption.

Additive hazards model

Where the Cox model uses the hazard ratio, the additive hazards model posits that covariates contribute additively: \[ \lambda(t \mid Z) \;=\; \lambda_0(t) + \beta^\mathrm{T} Z. \] A coefficient \(\beta_k\) has units of “events per person-time” rather than a ratio, providing an absolute effect size. Semiparametric estimation (Lin’s method) is analogous to partial likelihood, allowing estimation of \(\beta\) and baseline \(\lambda_0(t)\). If time-varying coefficients \(\beta(t)\) are preferred, Aalen’s nonparametric additive model offers full flexibility, with the cumulative effect estimated in a stepwise manner.

Finally, a Cox–Aalen hybrid model allows some covariates to act multiplicatively on the hazard and others additively over time, bridging the two approaches.


Proportional odds model

The proportional odds model targets survival functions via cumulative odds: \[ \log\left[\frac{1-S(t\mid Z)}{S(t \mid Z)}\right] \;=\; h_0(t) \;+\; \beta^\mathrm{T} Z. \] This implies a covariate’s effect is a constant odds ratio on “early” failure. It often accommodates scenarios where hazard ratios decline over time, but the odds ratio remains more stable. Estimation proceeds via a nonparametric maximum likelihood for \(h_0(t)\) and a finite-dimensional parameter \(\beta\).

Accelerated failure time model

The accelerated failure time (AFT) framework directly models log-time: \[ \log T \;=\; \beta^\mathrm{T} Z \;+\; \varepsilon, \] so \(\exp(\beta_k)\) is a time ratio. A unit increase in \(Z_k\) multiplies median (or mean) survival time by \(\exp(\beta_k)\). Parametric versions (e.g., log-normal, Weibull) are fit via standard maximum likelihood, while semiparametric AFT estimation uses iterative least squares or rank-based approaches (e.g., Buckley–James, rank-GEE). Such methods can be numerically sensitive when censoring is heavy or the underlying distribution is heavily skewed, but they provide an appealing direct interpretation of how survival time “speeds up” or “slows down.”

Example R commands

Below is an illustration of core functions (in different packages) implementing these methods. Assume a data frame df with columns: time, status, and additional covariates.

###############################
# 1. RMST (two-sample/regression)
###############################
library(survRM2)
obj_rmst <- rmst2(time = df$time, status = df$status, arm = df$arm, tau = 5)
summary(obj_rmst)           # RMST difference, ratio, etc.
plot(obj_rmst)              # KM curves with RMST shading

# Regression (IPCW-based) for RMST
obj_rmst_reg <- rmst2(time = df$time, status = df$status,
                      arm = df$arm,
                      covariates = df[, c("x1", "x2")],
                      tau = 5)
obj_rmst_reg$RMST.difference.adjusted  # Additive model
obj_rmst_reg$RMST.ratio.adjusted       # Multiplicative model

###############################
# 2. Additive hazards
###############################
library(addhazard)
obj_ah <- ah(Surv(time, status) ~ x1 + x2, data = df, ties = FALSE)
summary(obj_ah)  # Semiparametric additive hazards

library(timereg)
obj_aalen <- aalen(Surv(time, status) ~ x1 + x2, data = df)
summary(obj_aalen)  # Aalen’s nonparametric version
plot(obj_aalen)

# Cox-Aalen hybrid
obj_coxaalen <- cox.aalen(Surv(time, status) ~ prop(x1) + x2, data = df)
summary(obj_coxaalen)
plot(obj_coxaalen)   # Time-varying additive effect for x2

###############################
# 3. Proportional odds
###############################
obj_po <- prop.odds(Event(time, status) ~ x1 + x2, data = df)
summary(obj_po)
# Baseline cumulative odds stored in obj_po$cum

###############################
# 4. AFT models
###############################
library(survival)
# Parametric
obj_weibull <- survreg(Surv(time, status) ~ x1 + x2,
                       data = df, dist = "weibull")
summary(obj_weibull)

# Semiparametric (rank-based or weighted least squares)
library(aftgee)
obj_aft_sp <- aftgee(Surv(time, status) ~ x1 + x2,
                     data = df, B=200)
summary(obj_aft_sp)

Conclusion

Each of these methods introduced can be more appealing than the Cox model when hazard ratios vary over time or when the interpretation of “time gained” or “time ratio” is more intuitive for clinical or policy purposes. By integrating these options, practitioners have a broader and more robust toolkit for analyzing survival data across a wide range of scenarios.

R Code

Show the code
###############################################################################
# Chapter 5 R Code
#
# This script reproduces major numerical results in Chapter 5, including
#   1. Restricted mean survival time (RMST) analysis (survRM2)
#   2. Additive hazards analyses (addhazard, timereg)
#   3. Proportional odds models (timereg)
#   4. Accelerated failure time (AFT) models (aftgee, survival)
#   5. Pseudo-observation approach to RMST (pseudo)
###############################################################################

# =============================================================================
# (A) GBC Data Setup
# =============================================================================
library(survival)   # Survival objects and model fitting (Surv, survreg)
library(tidyverse)  # Data manipulation and ggplot2-based graphics

# -----------------------------------------------------------------------------
# 1. Read and prepare GBC data
# -----------------------------------------------------------------------------
gbc <- read.table("Data//German Breast Cancer Study//gbc.txt")

# Sort by subject id and time, then keep the first record per subject
o   <- order(gbc$id, gbc$time)
gbc <- gbc[o, ]
df  <- gbc[!duplicated(gbc$id), ]

# Collapse event types into a single event indicator
df$status <- (df$status > 0) + 0

# Rescale progesterone and estrogen by 100 for stability/interpretation
df$prog  <- df$prog / 100
df$estrg <- df$estrg / 100

# =============================================================================
# (B) Restricted Mean Survival Time (RMST) Analysis
# =============================================================================
library(survRM2)  # rmst2() for RMST-based comparisons and covariate adjustment

# -----------------------------------------------------------------------------
# 1. Two-sample RMST comparison (hormone vs no hormone) up to tau
# -----------------------------------------------------------------------------
obj <- rmst2(
  time   = df$time / 12,   # convert months to years
  status = df$status,      # event indicator
  arm    = df$hormone - 1, # hormone: {1,2} mapped to arm: {0,1}
  tau    = 5               # truncation time (years)
)

# Print results
obj

# More compact unadjusted results
obj$unadjusted.result

# Plot group-specific survival and RMST/RMTL summaries up to tau
plot(
  obj,
  xlab     = "Time (years)",
  ylab     = "Relapse-free survival",
  col.RMST = "gray",
  col.RMTL = "white",
  col      = "black",
  cex.lab  = 1.2,
  cex.axis = 1.2,
  xlim     = c(0, 5)
)

# -----------------------------------------------------------------------------
# 2. Regression adjustment for additional covariates
# -----------------------------------------------------------------------------
obj_reg <- rmst2(
  time       = df$time / 12,
  status     = df$status,
  arm        = df$hormone - 1,
  covariates = df[, 5:11], # covariate matrix (columns 5--11 of df)
  tau        = 5
)

# Print overall adjusted results
obj_reg

# Adjusted RMST difference
round(obj_reg$RMST.difference.adjusted, 3)

# Adjusted RMST ratio
round(obj_reg$RMST.ratio.adjusted, 3)

# Adjusted RMTL ratio
round(obj_reg$RMTL.ratio.adjusted, 3)

# Additional adjusted components (if present in the object)
obj_reg$adjusted.result

# =============================================================================
# (C) Additive Hazards Analysis
# =============================================================================
library(addhazard) # ah() for additive hazards regression

# -----------------------------------------------------------------------------
# 1. Additive hazards model via addhazard::ah()
# -----------------------------------------------------------------------------
# Convert hormone/meno to factors for modeling
df$hormone <- factor(df$hormone)
df$meno    <- factor(df$meno)

# Add a tiny random component to break ties (ties = FALSE below)
set.seed(123)
df$time.dties <- df$time / 12 + runif(nrow(df), 0, 1e-12)

# Fit additive hazards model (with tie-breaking already applied)
obj_ah <- ah(
  Surv(time.dties, status) ~ hormone + meno + age + size + grade + nodes + prog + estrg,
  data = df,
  ties = FALSE
)

# Model summary and key components
summary(obj_ah)
beta <- obj_ah$coef  # estimated regression coefficients
obj_ah$var           # variance-covariance matrix of the coefficients

# -----------------------------------------------------------------------------
# 2. Aalen's nonparametric additive model via timereg::aalen()
# -----------------------------------------------------------------------------
library(timereg) # aalen(), cox.aalen(), prop.odds()

obj_aalen <- aalen(
  Surv(time / 12, status) ~ hormone + meno + age + size + grade + nodes + prog + estrg,
  data = df
)

# Model summary and key components
summary(obj_aalen)
Bt            <- obj_aalen$cum     # cumulative coefficient estimates B(t)
obj_aalen$var.cum                 # variance of B(t)

# Default diagnostic/summary plots
par(mfrow = c(5, 2))
plot(obj_aalen)

# -----------------------------------------------------------------------------
# 3. Plot estimated cumulative regression coefficients using ggplot2
# -----------------------------------------------------------------------------
aalen_beta <- as_tibble(Bt) %>%
  select(-`(Intercept)`) %>%  # drop intercept to focus on covariate effects
  pivot_longer(
    cols      = -time,
    names_to  = "covariate",
    values_to = "beta"
  ) %>%
  mutate(covariate = fct_inorder(covariate))

# Optional overlay object (constructed from beta and Bt)
# NOTE: This block assumes beta has a compatible structure; it is kept as-is.
ly_beta <- tibble(
  time = Bt[, "time"],
  as_tibble(Bt[, "time"] %*% t(beta))
) %>%
  pivot_longer(
    cols      = -time,
    names_to  = "covariate",
    values_to = "beta"
  ) %>%
  mutate(covariate = fct_inorder(covariate))

# Label map for facet titles
var_labeller <- c(
  "hormone2" = "Hormone therapy (yes v. no)",
  "meno2"    = "Menopausal (yes v. no)",
  "age"      = "Age (years)",
  "size"     = "Tumor size (mm)",
  "grade"    = "Grade (1-3)",
  "nodes"    = "Nodes",
  "prog"     = "Progesterone (100 fmol/mg)",
  "estrg"    = "Estrogen (100 fmol/mg)"
)

# Step-function plots for cumulative coefficients; optional overlay retained
aalen_beta %>%
  ggplot(aes(x = time, y = beta)) +
  geom_step() +
  geom_line(
    data = ly_beta,
    aes(x = time, y = beta),
    color = "#C5050C", linetype = 2, linewidth = 1
  ) +
  scale_x_continuous(
    "Time (years)",
    limits = c(0, 6),
    breaks = seq(0, 6, by = 1)
  ) +
  labs(y = expression("B(t) (year"^{-1} * ")")) +
  facet_wrap(
    ~covariate,
    scales   = "free_y",
    ncol     = 2,
    labeller = labeller(covariate = var_labeller)
  ) +
  theme_minimal()

# ggsave("images/nonhaz_aalen_beta.png", width = 8, height = 8)
# ggsave("images/nonhaz_aalen_beta.eps", width = 8, height = 8)

# -----------------------------------------------------------------------------
# 4. Cox--Aalen model via timereg::cox.aalen()
# -----------------------------------------------------------------------------
# prop(hormone) specifies a proportional (time-constant) effect within Cox--Aalen
obj_cox_aalen <- cox.aalen(
  Surv(time / 12, status) ~ prop(hormone) + meno + age + size + grade + nodes + prog + estrg,
  data = df
)

summary(obj_cox_aalen)
plot(obj_cox_aalen)

# gamma: proportional (time-constant) coefficients; var.gamma: their variance
obj_cox_aalen$gamma
obj_cox_aalen$var.gamma

# =============================================================================
# (D) Proportional Odds Analysis
# =============================================================================
library(timereg) # prop.odds() for proportional odds regression

obj_po <- prop.odds(
  Event(time, status) ~ hormone + meno + age + size + grade + nodes + prog + estrg,
  data = df
)

summary(obj_po)

# Extract baseline cumulative odds from the cumulative output matrix
t         <- obj_po$cum[, 1]
base_odds <- obj_po$cum[, 2]

# -----------------------------------------------------------------------------
# 1. Plot baseline cumulative odds (base R)
# -----------------------------------------------------------------------------
par(mfrow = c(1, 1))
plot(
  stepfun(t, c(0, base_odds)),
  do.points  = FALSE,
  lwd        = 2,
  xlim       = c(0, 80),
  ylim       = c(0, 1.4),
  frame.plot = FALSE,
  xlab       = "Time (months)",
  ylab       = "Baseline cumulative odds",
  main       = ""
)

# -----------------------------------------------------------------------------
# 2. Plot baseline cumulative odds (ggplot)
# -----------------------------------------------------------------------------
tibble(time = t, odds = base_odds) %>%
  ggplot(aes(x = time / 12, y = odds)) +
  geom_step() +
  scale_x_continuous(
    "Time (years)",
    limits = c(0, 6),
    breaks = 1:6
  ) +
  scale_y_continuous("Baseline cumulative odds", limits = c(0, 1.25)) +
  theme_minimal()

# =============================================================================
# (E) Accelerated Failure Time (AFT) Analysis
# =============================================================================
library(aftgee) # aftgee() for semiparametric AFT regression

# -----------------------------------------------------------------------------
# 1. Semiparametric AFT model (aftgee)
# -----------------------------------------------------------------------------
# SE estimates are based on resampling; set.seed() makes this reproducible
set.seed(123)
obj_aftgee <- aftgee(
  Surv(time, status) ~ hormone + meno + age + size + grade + nodes + prog + estrg,
  data = df,
  B    = 500
)

summary(obj_aftgee)

# Acceleration factors exp(beta)
exp(obj_aftgee$coef.res)

# Variance of the log-acceleration factors
obj_aftgee$var.res

# -----------------------------------------------------------------------------
# 2. Parametric AFT model (survival::survreg)
# -----------------------------------------------------------------------------
obj_aft <- survreg(
  Surv(time, status) ~ hormone + meno + age + size + grade + nodes + prog + estrg,
  data = df,
  dist = "loglogistic"
)

summary(obj_aft)

# Coefficients, variance-covariance, and scale parameter
obj_aft$coefficients
obj_aft$var
obj_aft$scale

# =============================================================================
# (F) Pseudo-Observations Approach to RMST
# =============================================================================
library(pseudo)   # pseudomean() to compute pseudo-observations
library(geepack)  # geese() for GEE regression on pseudo-observations

# Compute pseudo-observations for RMST up to tau
tau         <- 5
pseudo_rmst <- pseudomean(df$time / 12, df$status, tmax = tau)

# Fit a regression model using pseudo-observations (identity link)
fit_pseudo <- geese(
  pseudo_rmst ~ hormone + meno + age + size + grade + nodes + prog + estrg,
  data      = df,
  mean.link = "identity"
)
# Alternative: mean.link = "log"

summary(fit_pseudo)