Chapter 3 - Nonparametric Estimation and Testing
Department of Biostatistics & Medical Informatics
University of Wisconsin-Madison
\[\newcommand{\d}{{\rm d}}\] \[\newcommand{\dd}{{\rm d}}\] \[\newcommand{\pr}{{\rm pr}}\] \[\newcommand{\var}{{\rm var}}\] \[\newcommand{\se}{{\rm se}}\] \[\newcommand{\indep}{\perp \!\!\! \perp}\]
Recall in Chapter 2…\[\begin{align} E\{\dd N(t)\mid X\geq t\}&=\frac{\pr\{\dd N^*(t)=1, C\geq t\}}{\pr(T\geq t, C\geq t)} \notag\\ &=\frac{\pr\{\dd N^*(t)=1\}\pr(C\geq t)}{\pr(T\geq t)\pr(C\geq t)} \notag\\ &=E\{\dd N^*(t)\mid T\geq t\}\notag\\ &=\dd\Lambda(t), \end{align}\]
Log-transform: product \(\to\) sum \[ \log\hat S(t)=\sum_{j:t_j\leq t}\log(1-d_j/n_j) \]
Delta method\(^*\) \[ \hat\var\{\hat S(t)\}=\hat S(t)^2\hat\var\{\log\hat S(t)\}, \]
Delta Method
If approximately \(S_n \sim N(\mu, \sigma^2)\), then approximately \(g(S_n) \sim N\left\{g(\mu), \dot g(\mu)^2\sigma^2\right\}\), where \(\dot g(\mu)=\dd g(\mu)/\dd\mu\).
Variance of KM \[\begin{equation}\label{eq:km:greenwood} \hat\var\{\hat S(t)\}=\hat S(t)^2\sum_{j:t_j\leq t}\frac{d_j}{n_j(n_j-d_j)} \end{equation}\]
Naive 95% confidence interval (CI) \[\begin{equation}\label{eq:km:ci_plain} \left[\hat S(t)-1.96\hat\se\{\hat S(t)\}, \hat S(t)+1.96\hat\se\{\hat S(t)\}\right] \end{equation}\]
Transform \(\zeta(t)=\log\{-\log S(t)\} \in \mathbb R\)
CI for \(\zeta(t)\) \[\begin{equation} \left[\hat\zeta(t)-1.96\hat\se\{\hat\zeta(t)\},\hat\zeta(t)+1.96\hat\se\{\hat\zeta(t)\}\right] \end{equation}\]
Transform the bounds back to \(S(t)\) \[ \left[\hat S(t)^{\exp[1.96\hat\se\{\hat\zeta(t)\}]}, \hat S(t)^{\exp[-1.96\hat\se\{\hat\zeta(t)\}]}\right] \subset [0, 1] \]
Remains to calculate \(\hat\se\{\hat\zeta(t)\}\) by delta method (Exercise)
survival::survfit()
(I)Surv(time, status) ~ 1
: fit curve to a homogeneous sample
Surv(time, status) ~ group
: fit curve to each level of group
data = df
: input data frame df
conf.type = "log-log"
: log-log transformation for CI
"log"
: default log transformation"plain"
: naive CIsurvival::survfit()
(II)surfit
object containing KM estimates
summary()
and plot()
summary(obj)
: a list containing
time
: \(t_j\) \((j=1,\ldots, m)\)surv
: \(\hat S(t_j)\)n.risk
: \(n_j\)n.event
: \(d_j\)std.err
: \(\hat\se\{\hat S(t_j)\}\)...
gtsummary::tbl_survfit()
survfit()
results# install.packages("gtsummary")
library(gtsummary)
# A single-group KM model
obj <- survfit(Surv(time, status) ~ 1, data = df)
# Summaries at specific times
tbl_surv <- tbl_survfit(
x = obj, # Provide the fitted survfit object
times = seq(40, 100, by = 20), # Time points for survival rates
label_header = "{time} days" # Column label: "xx days"
)
# Print out the table
tbl_surv
ggsurvfit::ggsurvfit()
ggplot2
# install.packages("ggsurvfit")
library(ggsurvfit)
obj <- survfit(Surv(time, status) ~ 1, data = df)
# Create a KM plot with confidence intervals and an at-risk table
ggsurvfit(obj) +
add_confidence_interval() + # Shaded 95% CI region
add_risktable() + # Show risk table
scale_x_continuous(breaks = seq(0, 100, by = 20)) + # X-axis breaks
ylim(0, 1) + # Y-axis limits
labs(
x = "Time (days)",
y = "Tumor-free probabilities"
) +
theme_minimal()
rats.rx
obj <- survfit(Surv(time, status) ~ 1, data = rats.rx,
conf.type = "log-log")
summary(obj)
# Call: survfit(formula = Surv(time, status) ~ 1, data = rats.rx,
# conf.type = "log-log")
#
# time n.risk n.event survival std.err lower 95% CI upper 95% CI
# 34 99 1 0.990 0.0100 0.930 0.999
# 39 98 1 0.980 0.0141 0.922 0.995
# 45 97 1 0.970 0.0172 0.909 0.990
# 67 89 1 0.959 0.0202 0.894 0.984
# 70 86 1 0.948 0.0228 0.879 0.978
# ...
plot()
survival::survdiff()
Surv(time, status) ~ group
: test survival function between levels of variable group
strata(str_var)
: stratified by variable str_var
(optional)rho = r
: weights \(\hat S(t_j-)^\rho\) with \(\rho=\) r
pvalue
(p-value of the test)gtsummary::tbl_survfit()
library(gtsummary)
# A two-group KM model
obj <- survfit(Surv(time, status) ~ rx, data = rats)
# Summaries at specific times with labeled treatment groups
tbl_surv <- tbl_survfit(
x = obj, # Provide the fitted survfit object
times = seq(40, 100, by = 20), # Time points for survival rates
label_header = "{time} days", # Column label: "xx days"
label = list(rx ~ "Treatment") # Rename 'rx' to 'Treatment'
)
# Print out the table
tbl_surv
ggsurvfit::ggsurvfit()
library(ggsurvfit)
# Use survfit2 as recommended by ggsurvfit
obj2 <- survfit2(Surv(time, status) ~ rx, data = rats)
# Create a group-specific KM plot with log-rank test p-value
ggsurvfit(obj, linetype_aes = TRUE, linewidth = 1) + # Use line types
add_risktable(
risktable_stats = "n.risk", # Include only numbers at risk
theme = list(
theme_risktable_default(), # Default risk table theme
scale_y_discrete(labels = c('Drug', 'Control')) # Group labels
)
) +
theme_classic()
rx
stratified by sex
head(rats)
# litter rx time status sex
# 1 1 1 101 0 f
# 2 1 0 49 1 f
# 3 1 0 104 0 f
# ...
survdiff(Surv(time, status) ~ rx + strata(sex), data = rats)
# Call:
# survdiff(formula = Surv(time, status) ~ rx + strata(sex), data = rats)
#
# N Observed Expected (O-E)^2/E (O-E)^2/V
# rx=0 200 21 28.9 2.16 6.99
# rx=1 100 21 13.1 4.77 6.99
#
# Chisq= 7 on 1 degrees of freedom, p= 0.008
npsm::gehan.test()
nph::logrank.maxtest()
(maximum over multiple weighting schemes)survival::survfit()
:survival::survdiff()
gtsummary::tbl_survfit()
ggsurvfit::ggsurvfit()
Choose one
Problem 3.19
(Extra credit) Choose one