Chapter 5 - Other Non- and Semi-parametric Methods

Slides

Lecture slides here. (To convert html to pdf, press E $\to$ Print $\to$ Destination: Save to pdf)

Chapter Summary

When the proportional hazards assumption is questionable or when alternative measures of treatment effect are desired, a variety of non- and semi-parametric approaches can be employed. Restricted mean survival time (RMST) focuses on time lived within a specified window, while additive hazards models quantify absolute rather than relative risk. The proportional odds model captures time-invariant covariate effects on the odds of failing early, and accelerated failure time (AFT) models directly relate log-time to covariates, often offering more intuitive interpretations of “time gained or lost.”

Restricted mean survival time

The restricted mean survival time over $[0, τ]$ is $μ (τ) = E (T \land τ) = \int_{0}^{τ} S (t) d t,$ where $S (t)$ is the survival function. Compared to hazard ratios, RMST allows direct interpretation of how many additional event-free years (or months) are gained or lost under different conditions. For a two-sample comparison, one can compute the difference $μ_{1} (τ) - μ_{0} (τ)$ or the ratio $μ_{1} (τ) / μ_{0} (τ)$ . Regression extends this to $g {μ (τ ∣ Z)} = β^{T} Z,$ where $g$ could be the identity (additive on RMST) or log (multiplicative on RMST). Estimation often uses inverse probability of censoring weights (IPCW) or pseudo-observations and remains valid without a proportional hazards assumption.

Additive hazards model

Where the Cox model uses the hazard ratio, the additive hazards model posits that covariates contribute additively: $λ (t ∣ Z) = λ_{0} (t) + β^{T} Z .$ A coefficient $β_{k}$ has units of “events per person-time” rather than a ratio, providing an absolute effect size. Semiparametric estimation (Lin’s method) is analogous to partial likelihood, allowing estimation of $β$ and baseline $λ_{0} (t)$ . If time-varying coefficients $β (t)$ are preferred, Aalen’s nonparametric additive model offers full flexibility, with the cumulative effect estimated in a stepwise manner.

Finally, a Cox–Aalen hybrid model allows some covariates to act multiplicatively on the hazard and others additively over time, bridging the two approaches.

Proportional odds model

The proportional odds model targets survival functions via cumulative odds: $\log [\frac{1 - S (t ∣ Z)}{S (t ∣ Z)}] = h_{0} (t) + β^{T} Z .$ This implies a covariate’s effect is a constant odds ratio on “early” failure. It often accommodates scenarios where hazard ratios decline over time, but the odds ratio remains more stable. Estimation proceeds via a nonparametric maximum likelihood for $h_{0} (t)$ and a finite-dimensional parameter $β$ .

Accelerated failure time model

The accelerated failure time (AFT) framework directly models log-time: $\log T = β^{T} Z + ε,$ so $\exp (β_{k})$ is a time ratio. A unit increase in $Z_{k}$ multiplies median (or mean) survival time by $\exp (β_{k})$ . Parametric versions (e.g., log-normal, Weibull) are fit via standard maximum likelihood, while semiparametric AFT estimation uses iterative least squares or rank-based approaches (e.g., Buckley–James, rank-GEE). Such methods can be numerically sensitive when censoring is heavy or the underlying distribution is heavily skewed, but they provide an appealing direct interpretation of how survival time “speeds up” or “slows down.”

Example R commands

Below is an illustration of core functions (in different packages) implementing these methods. Assume a data frame df with columns: time, status, and additional covariates.

###############################
# 1. RMST (two-sample/regression)
###############################
library(survRM2)
obj_rmst <- rmst2(time = df$time, status = df$status, arm = df$arm, tau = 5)
summary(obj_rmst)           # RMST difference, ratio, etc.
plot(obj_rmst)              # KM curves with RMST shading

# Regression (IPCW-based) for RMST
obj_rmst_reg <- rmst2(time = df$time, status = df$status,
                      arm = df$arm,
                      covariates = df[, c("x1", "x2")],
                      tau = 5)
obj_rmst_reg$RMST.difference.adjusted  # Additive model
obj_rmst_reg$RMST.ratio.adjusted       # Multiplicative model

###############################
# 2. Additive hazards
###############################
library(addhazard)
obj_ah <- ah(Surv(time, status) ~ x1 + x2, data = df, ties = FALSE)
summary(obj_ah)  # Semiparametric additive hazards

library(timereg)
obj_aalen <- aalen(Surv(time, status) ~ x1 + x2, data = df)
summary(obj_aalen)  # Aalen’s nonparametric version
plot(obj_aalen)

# Cox-Aalen hybrid
obj_coxaalen <- cox.aalen(Surv(time, status) ~ prop(x1) + x2, data = df)
summary(obj_coxaalen)
plot(obj_coxaalen)   # Time-varying additive effect for x2

###############################
# 3. Proportional odds
###############################
obj_po <- prop.odds(Event(time, status) ~ x1 + x2, data = df)
summary(obj_po)
# Baseline cumulative odds stored in obj_po$cum

###############################
# 4. AFT models
###############################
library(survival)
# Parametric
obj_weibull <- survreg(Surv(time, status) ~ x1 + x2,
                       data = df, dist = "weibull")
summary(obj_weibull)

# Semiparametric (rank-based or weighted least squares)
library(aftgee)
obj_aft_sp <- aftgee(Surv(time, status) ~ x1 + x2,
                     data = df, B=200)
summary(obj_aft_sp)

Conclusion

Each of these methods introduced can be more appealing than the Cox model when hazard ratios vary over time or when the interpretation of “time gained” or “time ratio” is more intuitive for clinical or policy purposes. By integrating these options, practitioners have a broader and more robust toolkit for analyzing survival data across a wide range of scenarios.

R Code

Show the code

###############################################################################
# Chapter 5 R Code
#
# This script reproduces all major numerical results in Chapter 5, including:
#   1. Restricted mean survival time (RMST) analysis (with survRM2)
#   2. Additive hazards analyses (with addhazard, timereg)
#   3. Proportional odds models (with timereg)
#   4. Accelerated failure time (AFT) models (with aftgee, survival)
#   5. Pseudo-observation approach to RMST (with pseudo)
###############################################################################


#==============================================================================
# (A) GBC Data Setup
#==============================================================================
library(survival)
library(tidyverse)

#------------------------------------------------------------------------------
# 1. Read and Prepare GBC Data
#------------------------------------------------------------------------------
gbc <- read.table("Data//German Breast Cancer Study//gbc.txt")

# Sort the data by time within each id
o   <- order(gbc$id, gbc$time)
gbc <- gbc[o, ]

# Keep the first row for each id (first event)
df <- gbc[!duplicated(gbc$id), ]

# Set status = 1 if status == 2 or 1
df$status <- (df$status > 0) + 0

# Reduce the scales of prog and estrg by 100
df$prog  <- df$prog / 100
df$estrg <- df$estrg / 100


#==============================================================================
# (B) Restricted Mean Survival Time (RMST) Analysis
#     Using the "survRM2" Package
#==============================================================================
# install.packages("survRM2")
library(survRM2)

#------------------------------------------------------------------------------
# 1. Two-Sample Comparison (Hormonal vs. Non-Hormonal) on 5-Year RMST
#------------------------------------------------------------------------------
obj <- rmst2(
  time = df$time / 12,   # convert months to years
  status = df$status,
  arm = df$hormone - 1,  # hormone: {1, 2} => arm: {0, 1}
  tau = 5
)

# Print results
obj
# More compact results
obj$unadjusted.result

# Graphical display of group-specific RMST
plot(
  obj,
  xlab     = "Time (years)",
  ylab     = "Relapse-free survival",
  col.RMST = "gray",
  col.RMTL = "white",
  col      = "black",
  cex.lab  = 1.2,
  cex.axis = 1.2,
  xlim     = c(0, 5)
)

#------------------------------------------------------------------------------
# 2. Regression Adjustment for Additional Covariates
#------------------------------------------------------------------------------
obj_reg <- rmst2(
  time       = df$time / 12,
  status     = df$status,
  arm        = df$hormone - 1,
  covariates = df[, 5:11],  # columns 5-11 in df
  tau        = 5
)

# Print overall results
obj_reg

# Additive model on RMST
round(obj_reg$RMST.difference.adjusted, 3)

# Multiplicative model on RMST
round(obj_reg$RMST.ratio.adjusted, 3)

# Multiplicative model on RMTL
round(obj_reg$RMTL.ratio.adjusted, 3)

# For hormone treatment only
# (May require checking if the object contains this component)
obj_reg$adjusted.result


#==============================================================================
# (C) Additive Hazards Analysis
#==============================================================================
# install.packages("addhazard")
library(addhazard)

#------------------------------------------------------------------------------
# 1. Fitting an Additive Hazards Model (ah) from addhazard
#------------------------------------------------------------------------------
# Convert hormone/meno to factors
df$hormone <- factor(df$hormone)
df$meno    <- factor(df$meno)

# Add a tiny random component to break ties
set.seed(123)
df$time.dties <- df$time / 12 + runif(nrow(df), 0, 1e-12)

# Fit additive hazards model
obj_ah <- ah(
  Surv(time.dties, status) ~ hormone + meno + age + size + grade + nodes + prog + estrg,
  data = df,
  ties = FALSE
)

summary(obj_ah)
beta <- obj_ah$coef  # estimated regression coefficients
#------------------------------------------------------------------------------
# 2. Aalen's Nonparametric Model (aalen) from timereg
#------------------------------------------------------------------------------
# install.packages("timereg")
library(timereg)

obj_aalen <- aalen(
  Surv(time / 12, status) ~ hormone + meno + age + size + grade + nodes + prog + estrg,
  data = df
)

summary(obj_aalen)

Bt <- obj_aalen$cum  # cumulative regression function estimates

par(mfrow = c(5, 2))
plot(obj_aalen)

#------------------------------------------------------------------------------
# 3. Plotting the Estimated Cumulative Regression Coefficients
#------------------------------------------------------------------------------
aalen_beta <- as_tibble(Bt) %>%
  select(-`(Intercept)`) %>%    # remove intercept
  pivot_longer(
    cols      = -time,
    names_to  = "covariate",
    values_to = "beta"
  ) %>%
  mutate(covariate = fct_inorder(covariate))

# (Optional) If you intended a semiparametric overlay, 
# you would need a 'beta' object. This is a placeholder:
ly_beta <- tibble(
  time = Bt[, "time"],
  # example usage if 'beta' were a coefficient vector or matrix
  as_tibble(Bt[, "time"] %*% t(beta))
) %>%
  pivot_longer(
    cols      = -time,
    names_to  = "covariate",
    values_to = "beta"
  ) %>%
  mutate(covariate = fct_inorder(covariate))

var_labeller <- c(
  "hormone2" = "Hormone therapy (yes v. no)",
  "meno2"    = "Menopausal (yes v. no)",
  "age"      = "Age (years)",
  "size"     = "Tumor size (mm)",
  "grade"    = "Grade (1-3)",
  "nodes"    = "Nodes",
  "prog"     = "Progesterone (100 fmol/mg)",
  "estrg"    = "Estrogen (100 fmol/mg)"
)

aalen_beta %>%
  ggplot(aes(x = time, y = beta)) +
  geom_step() +
  # If you had a semiparametric line, you'd add:
  geom_line(data = ly_beta, aes(x = time, y = beta),
            color = "#C5050C", linetype = 2, linewidth = 1) +
  scale_x_continuous(
    "Time (years)",
    limits = c(0, 6),
    breaks = seq(0, 6, by = 1)
  ) +
  labs(y = expression("B(t) (year"^{-1} * ")")) +
  facet_wrap(
    ~covariate,
    scales   = "free_y",
    ncol     = 2,
    labeller = labeller(covariate = var_labeller)
  ) +
  theme_minimal()

ggsave("images/nonhaz_aalen_beta.png", width = 8, height = 8)
ggsave("images/nonhaz_aalen_beta.eps", width = 8, height = 8)

#------------------------------------------------------------------------------
# 4. Cox-Aalen Model
#------------------------------------------------------------------------------
obj_cox_aalen <- cox.aalen(
  Surv(time / 12, status) ~ prop(hormone) + meno + age + size + grade + nodes + prog + estrg,
  data = df
)

summary(obj_cox_aalen)
plot(obj_cox_aalen)

obj_cox_aalen$gamma
obj_cox_aalen$var.gamma


#==============================================================================
# (D) Proportional Odds Analysis
#==============================================================================
# install.packages("timereg")
library(timereg)

obj_po <- prop.odds(
  Event(time, status) ~ hormone + meno + age + size + grade + nodes + prog + estrg,
  data = df
)

summary(obj_po)

t         = obj_po$cum[, 1]
base_odds = obj_po$cum[, 2]

#------------------------------------------------------------------------------
# 1. Plot Baseline Cumulative Odds (Base R)
#------------------------------------------------------------------------------
par(mfrow = c(1, 1))
plot(
  stepfun(t, c(0, base_odds)),
  do.points   = FALSE,
  lwd         = 2,
  xlim        = c(0, 80),
  ylim        = c(0, 1.4),
  frame.plot  = FALSE,
  xlab        = "Time (months)",
  ylab        = "Baseline cumulative odds",
  main        = ""
)

#------------------------------------------------------------------------------
# 2. Plot Baseline Cumulative Odds (ggplot)
#------------------------------------------------------------------------------
tibble(
  time = t,
  odds = base_odds
) %>%
  ggplot(aes(x = time / 12, y = odds)) +
  geom_step() +
  scale_x_continuous(
    "Time (years)",
    limits = c(0, 6),
    breaks = 1:6
  ) +
  scale_y_continuous("Baseline cumulative odds", limits = c(0, 1.25)) +
  theme_minimal()

ggsave("images/nonhaz_po_base.png", width = 6, height = 4)
ggsave("images/nonhaz_po_base.eps", width = 6, height = 4)


#==============================================================================
# (E) Accelerated Failure Time (AFT) Analysis
#==============================================================================
# install.packages("aftgee")
library(aftgee)

#------------------------------------------------------------------------------
# 1. Semiparametric AFT Model (aftgee)
#------------------------------------------------------------------------------
set.seed(123)  # SE is based on resampling
obj_aftgee <- aftgee(
  Surv(time, status) ~ hormone + meno + age + size + grade + nodes + prog + estrg,
  data = df,
  B    = 500
)

summary(obj_aftgee)

# Acceleration factors exp(beta)
exp(obj_aftgee$coef.res)

# Variance of the log-acceleration factors
obj_aftgee$var.res

#------------------------------------------------------------------------------
# 2. Parametric AFT Model (survreg)
#------------------------------------------------------------------------------
obj_aft <- survreg(
  Surv(time, status) ~ hormone + meno + age + size + grade + nodes + prog + estrg,
  data  = df,
  dist  = "loglogistic"
)

summary(obj_aft)
obj_aft$coefficients
obj_aft$var
obj_aft$scale


#==============================================================================
# (F) Pseudo-Observations Approach to RMST
#==============================================================================
# install.packages("pseudo")
library(pseudo)

# Example usage on df (instead of lung)
tau          <- 5
pseudo_rmst  <- pseudomean(df$time / 12, df$status, tmax = tau)

# Fit a regression model using pseudo-observations
library(geepack)
fit_pseudo <- geese(
  pseudo_rmst ~ hormone + meno + age + size + grade + nodes + prog + estrg,
  data      = df,
  mean.link = "identity"
)
# Alternatively: mean.link = "log"

summary(fit_pseudo)