Chapter 4 - Cox Proportional Hazards Regression
Department of Biostatistics & Medical Informatics
University of Wisconsin-Madison
Model specification
Partial-likelihood estimation and inference
Residual analysis and goodness-of-fit
Time-dependent covariates
\[\newcommand{\d}{{\rm d}}\] \[\newcommand{\T}{{\rm T}}\] \[\newcommand{\dd}{{\rm d}}\] \[\newcommand{\pr}{{\rm pr}}\] \[\newcommand{\var}{{\rm var}}\] \[\newcommand{\se}{{\rm se}}\] \[\newcommand{\indep}{\perp \!\!\! \perp}\] \[\newcommand{\Pn}{n^{-1}\sum_{i=1}^n}\]
Parametrize baseline function \[\begin{equation}\label{eq:cox:ph_param} \lambda(t\mid Z;\theta)=\lambda_0(t;\eta)\exp(\beta^\T Z) \end{equation}\]
Weibull PH model \[\begin{equation}\label{eq:cox:weibull_base} \lambda(t\mid Z;\theta)=\alpha\gamma^{-\alpha}t^{\alpha-1}\exp(\beta^\T Z). \end{equation}\]
Estimation
Parametric constraints \(\to\) shape of baseline function
Semiparametric model
Partial likelihood
Partial likelihood \[\begin{align*} PL_n(\beta)=\prod_{j=1}^m\pr(\mathcal F_j\mid \mathcal R_j,d_j=1) =\prod_{j=1}^m\frac{\exp(\beta^\T Z_{(j)})}{\sum_{i\in\mathcal R_j}\exp(\beta^\T Z_i)} \end{align*}\]
Log-partial likelihood \[\begin{align}\label{eq:cox:pln} pl_n(\beta)=n^{-1}\log PL_n(\beta)&=n^{-1}\sum_{j=1}^m\left\{\beta^\T Z_{(j)}-\log\sum_{i\in\mathcal R_j}\exp(\beta^\T Z_i)\right\}\notag\\ &= n^{-1}\sum_{i=1}^n \delta_i\left\{\beta^\T Z_i- \log\sum_{j=1}^n I(X_j\geq X_i)\exp(\beta^\T Z_j) \right\}\notag\\ &=n^{-1}\sum_{i=1}^n\int_0^\infty \left\{\beta^\T Z_i- \log\sum_{j=1}^n I(X_j\geq t)\exp(\beta^\T Z_j) \right\}\dd N_i(t). \end{align}\]
Exercise: Conditional survival
A patient with covariate value \(z\) is censored at time \(c\). Use the fit model to predict his/her future survival probabilities.
survival::coxph()
(I)Surv(time, status) ~ covariates
: \((X, \delta) \sim Z\)strata(str_var)
: stratified by variable str_var
(optional)coxph
obj$coefficients
: \(\hat\beta\)obj$var
: \(\hat\var(\hat\beta)\)survival::coxph()
(II)Lambda0$time
: \(t\)Lambda0$hazard
: \(\hat\Lambda_0(t)\)Lambda0$strata
: levels of str_var
if stratified by itsurvival::coxph()
(III)summary(obj)
summary(obj)$coefficients
: a matrix containing \(\hat\beta\), hazard ratio \(\exp(\hat\beta)\), \(\hat\se(\hat\beta)\), test statistic, and \(p\)-value as columns#--- Data frame ---------------------
head(data.CE)
# id time status hormone age meno size grade nodes prog estrg
# 1 1 43.836066 1 1 38 1 18 3 5 141 105
# 3 2 46.557377 1 1 52 1 20 1 1 78 14
# 5 3 41.934426 1 1 47 1 30 2 1 422 89
# 7 4 4.852459 0 1 40 1 24 1 3 25 11
# 8 5 61.081967 0 2 64 2 19 2 1 19 9
# 9 6 63.377049 0 2 49 2 56 1 3 356 64
#--- Model fitting ------------------
obj <- coxph(Surv(time, status)~ hormone + meno + age + size + grade
+ prog + estrg, data = data.CE)
#--- Summary results ------------------
summary(obj)
# n= 686, number of events= 299
# coef exp(coef) se(coef) z Pr(>|z|)
# hormone2 -0.3552391 0.7010058 0.1292278 -2.749 0.00598 **
# meno2 0.2683165 1.3077610 0.1840516 1.458 0.14489
# age -0.0087937 0.9912449 0.0093783 -0.938 0.34842
# size 0.0153213 1.0154392 0.0036538 4.193 2.75e-05 ***
# grade2 0.7078888 2.0297017 0.2485544 2.848 0.00440 **
# grade3 0.8171706 2.2640847 0.2689544 3.038 0.00238 **
# prog -0.0022942 0.9977084 0.0005775 -3.972 7.12e-05 ***
# estrg 0.0001788 1.0001789 0.0004686 0.382 0.70274
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Tumor grade?
grade2
and grade3
#--- Breslow estimates of baseline function ------------------
Lambda0 <- basehaz(obj,centered = FALSE)
#plot the baseline hazard function
plot(stepfun(Lambda0$time, c(0, Lambda0$hazard)), do.points=F,
xlim=c(0, 100), ylim=c(0, 0.8), lwd=2, frame.plot=F, main="",
xlab="Time (months)", ylab="Baseline cumulative hazard function")
Exercise: Survival prediction
Use obj$coefficients
(\(\hat\beta\)) and Lambda0
(\(\hat\Lambda_0(t)\)) to predict survival function for arbitrary \(z\).
survival::resid()
(I)## calculate cox-snell by martingale residuals (data frame: df)
coxsnellres <- df$status - resid(obj, type = "martingale")
## Use Nelsen-Aalen for cumulative hazard
fit <- survfit(Surv(coxsnellres, df$status) ~ 1)
Lambda <- cumsum(fit$n.event/fit$n.risk)
## scatter plot
plot(log(fit$time), log(Lambda), xlab = "log(t)",
ylab = "log-cumulative hazard")
abline(0, 1, lty = 3) # add a reference line
survival::cox.zph()
sch$table
: \(\chi^2\) tests of proportionality for each covariate and overallsch$time
: \(m\)-vector of the \(X_i\)sch$y
: \((m\times p)\)-matrix of the rescaled \(\hat r_i\)plot(sch)
: plot the residuals, one covariate per panelsurvival::resid()
(II)Martingale/Deviance residuals
df
: data framezk
: name of covariate of interest# get the residuals
mart_resid <- resid(obj, type = 'martingale') # n-vector of martingale
dev_resid <- resid(obj, type = 'deviance') # n-vector of deviance
# scatter plot residuals against covariate
plot(df$zk, mart_resid)
lines(lowess(df$zk, mart_resid)) # add a smooth line through points
abline(0, 0, lty = 3) # add a dotted reference line
coxsnellres <- data.CE$status-resid(obj, type="martingale")
## Then use N-A method to estimate the cumulative
## hazard function for residuals
fit <- survfit(Surv(coxsnellres,data.CE$status) ~ 1)
Htilde <- cumsum(fit$n.event/fit$n.risk)
## scatter plot
plot(log(fit$time), log(Htilde), xlab="log(t)", frame.plot = FALSE,
ylab="log-cumulative hazard", xlim = c(-8, 2), ylim = c(-8, 2))
abline(0, 1, lty = 3, lwd = 1.5) # add a reference line
## Martingale residuals
mart_resid <- resid(obj.stra,type = 'martingale')
#plot residuals against, e.g., age
plot(data.CE$age, mart_resid, main='Age', cex.lab=1.2, cex.axis=1.2,
xlab="Age (years)", ylab="Martingale residuals")
lines(lowess(data.CE$age, mart_resid), lwd = 2) # add a smooth line
abline(0, 0, lty = 3, lwd = 2) # add a reference line
agec=1
: \(\leq 40\) yrs; agec=2
: \((40, 60]\) yrs; agec=3
: \(> 60\) yrs# Call:
# coxph(formula = Surv(time, status) ~ hormone + meno + agec +
# size + prog + estrg + strata(grade), data = data.CE)
# coef exp(coef) se(coef) z Pr(>|z|)
# hormone2 -0.3788600 0.6846415 0.1294642 -2.926 0.00343 **
# meno2 0.2804119 1.3236749 0.1567576 1.789 0.07364 .
# agec2 -0.5315678 0.5876829 0.1984112 -2.679 0.00738 **
# agec3 -0.4965589 0.6086214 0.2497403 -1.988 0.04678 *
# size 0.0164114 1.0165468 0.0036964 4.440 9.00e-06 ***
# prog -0.0022708 0.9977318 0.0005812 -3.907 9.36e-05 ***
# estrg 0.0001254 1.0001254 0.0004718 0.266 0.79041
# ---
survival::coxph()
(IV)(start, stop)
: period during which covariate takes on a particular valueevent
: event indicator at time stop
event = 0
need not mean censoring (why?)coxph
object# Call:
# coxph(formula = Surv(start, stop, event) ~ age + accpt + surgery +
# transplant, data = heart)
#
# n= 172, number of events= 75
#
# coef exp(coef) se(coef) z Pr(>|z|)
# age 0.02717 1.02754 0.01371 1.981 0.0476 *
# accpt -0.14635 0.86386 0.07047 -2.077 0.0378 *
# surgery -0.63721 0.52877 0.36723 -1.735 0.0827 .
# transplant1 -0.01025 0.98980 0.31375 -0.033 0.9739
---
survival::coxph()
PHREG
df$status - resid(obj, type = "martingale")
cox.zph(obj)
resid(obj, type = c("martingale", "deviance"))
coxph(Surv(start, stop, event) ~ covariates)