Chapter 4 - Semiparametric Regression
Department of Biostatistics & Medical Informatics
University of Wisconsin-Madison
April 25, 2025
Proportional win-fractions (PW) model
WR
package)High-dimensional data
WRNet
package)\[\newcommand{\d}{{\rm d}} \newcommand{\T}{{\rm T}} \newcommand{\dd}{{\rm d}} \newcommand{\cc}{{\rm c}} \newcommand{\pr}{{\rm pr}} \newcommand{\var}{{\rm var}} \newcommand{\se}{{\rm se}} \newcommand{\indep}{\perp \!\!\! \perp} \newcommand{\Pn}{n^{-1}\sum_{i=1}^n} \newcommand\mymathop[1]{\mathop{\operatorname{#1}}} \newcommand{\Ut}{{n \choose 2}^{-1}\sum_{i<j}\sum} \def\a{{(a)}} \def\b{{(1-a)}} \def\t{{(1)}} \def\c{{(0)}} \def\d{{\rm d}} \def\T{{\rm T}} \def\bs{\boldsymbol} \def\C{{\mathcal C}} \def\S{{\mathcal S}} \def\Sk{{\mathcal S}^{(k)}} \newcommand{\wh}{\widehat} \]
Univariate | Hierarchical composite | |
---|---|---|
Hypothesis testing | Gehan, log-rank | Win ratio/odds, net benefit (Ch 2) |
Nonparametric estimation | RMST | RMT-IF (Ch 3) |
Semiparametric regression | Cox PH model | ? |
Multiplicative effects (Mao & Wang, 2021) \[\begin{equation}\label{eq:wr_reg} WR(t\mid Z_i, Z_j;\mathcal W)=\exp\left\{\beta^{\rm T}\left(Z_i- Z_j\right)\right\} \end{equation}\]
PW: covariate-specific win/loss fractions proportional over time
\(\beta\): log-WR associated with unit increases in covariates (regardless of follow-up time)
Semiparametric: Parametric covariate effects, nonparametric otherwise
Denote model by PW(\(\mathcal W\))
WR::pwreg()
(ID, time, status)
: same as WR::WRrec()
Z
: covariate matrix; strata
: possible stratifier (categorical)pwreg
obj$beta
: \(\hat\beta\)obj$Var
: \(\hat\var(\hat\beta)\)print(obj)
to summarize regression resultsWR::score.proc()
obj
: a pwreg
objectscore.proc
score.obj$t
: \(t\)score.obj$score
: a matrix with rescaled residual process for each covariate per rowplot(score.obj, k)
: plot the rescaled residuals for \(k\)th covariateUsual care (N=231) | Training (N=220) | |
---|---|---|
Age (years) | 56 (46, 65.5) | 54 (46, 62.2) |
Sex - Female | 153 (66.2%) | 121 (55%) |
Sex - Male | 78 (33.8%) | 99 (45%) |
Race - White | 117 (50.6%) | 111 (50.5%) |
Race - Black | 103 (44.6%) | 101 (45.9%) |
Race - Other | 11 (4.8%) | 8 (3.6%) |
BMI | 31.3 (26.3, 37.2) | 31 (25.8, 36.5) |
LVEF (%) | 25.1 (20.9, 31.3) | 25.1 (20.9, 31.2) |
Hypertension | 129 (55.8%) | 129 (58.6%) |
COPD | 21 (9.1%) | 15 (6.8%) |
Diabetes | 71 (30.7%) | 58 (26.4%) |
ACE Inhibitor | 174 (75.3%) | 167 (75.9%) |
Beta Blocker | 223 (96.5%) | 211 (95.9%) |
# number of covariates (-c(ID, time, status))
p <- ncol(non_ischemic) - 3
# extract ID, time, status and covariates matrix Z from the data.
# note that: ID, time and status should be column vector
ID <- non_ischemic[,"ID"]
time <- non_ischemic[,"time"] / 30.5 # days to months
status <- non_ischemic[,"status"]
Z <- as.matrix(non_ischemic[, 4:(3+p)])
# pass the parameters into the function
obj <- pwreg(ID, time, status, Z)
obj
#> Call:
#> pwreg(ID = ID, time = time, status = status, Z = Z)
#> Total number of pairs: 101475
#> Wins-losses on death: 7644 (7.5%)
#> Wins-losses on non-fatal event: 78387 (77.2%)
#> Indeterminate pairs 15444 (15.2%)
#>
#> Newton-Raphson algorithm converged in 5 iterations.
#>
#> Overall test: chisq test with 13 degrees of freedom;
#> Wald statistic 24.9 with p-value 0.02392931
obj
#> Estimate se z.value p.value
#> Training vs Usual 0.1906687 0.1264658 1.5077 0.13164
#> Age (year) -0.0128306 0.0057285 -2.2398 0.02510 *
#> Male vs Female -0.1552923 0.1294198 -1.1999 0.23017
#> Black vs White -0.3026335 0.1461330 -2.0709 0.03836 *
#> Other vs White -0.3565390 0.3424360 -1.0412 0.29779
#> BMI -0.0181310 0.0097582 -1.8580 0.06316 .
#> LVEF 0.0214905 0.0086449 2.4859 0.01292 *
#> Hypertension -0.0318291 0.1456217 -0.2186 0.82698
#> COPD -0.4023069 0.2066821 -1.9465 0.05159 .
#> Diabetes 0.0703990 0.1419998 0.4958 0.62006
#> ACE Inhibitor -0.1068201 0.1571317 -0.6798 0.49662
#> Beta Blocker -0.5344979 0.3289319 -1.6250 0.10417
#> Smoker -0.0602350 0.1682826 -0.3579 0.72039
# extract estimates of (beta_4, beta_5)
beta <- matrix(obj$beta[4:5])
# extract estimated covariance matrix for (beta_4, beta_5)
Sigma <- obj$Var[4:5, 4:5]
# compute chisq statistic in quadratic form
chistats <- t(beta) %*% solve(Sigma) %*% beta
# compare the Wald statistic with the reference
# distribution of chisq(2) to obtain the p-value
1 - pchisq(chistats, df = 2)
#> [,1]
#> [1,] 0.1016988
#> Point and interval estimates for the win ratios:
#> Win Ratio 95% lower CL 95% higher CL
#> Training vs Usual 1.2100585 0.9444056 1.5504374
#> Age (year) 0.9872513 0.9762288 0.9983983
#> Male vs Female 0.8561648 0.6643471 1.1033663
#> Black vs White 0.7388699 0.5548548 0.9839127
#> Other vs White 0.7000951 0.3578286 1.3697431
#> BMI 0.9820323 0.9634287 1.0009952
#> LVEF 1.0217231 1.0045572 1.0391823
#> Hypertension 0.9686721 0.7281543 1.2886357
#> COPD 0.6687755 0.4460178 1.0027865
#> Diabetes 1.0729362 0.8122757 1.4172433
#> ACE Inhibitor 0.8986873 0.6604773 1.2228110
#> Beta Blocker 0.5859634 0.3075270 1.1164977
#> Smoker 0.9415433 0.6770144 1.3094312
Pathwise algorithm (Friedman et al., 2010)
x
: covariate matrix containing \(z_{ij}\) ,y
: response vector \(\delta_{ij}\)intercept = FALSE
removes interceptlambda
: user-specified \(\lambda\) vectorcv.glmnet()
cv.glmnet()
Data format
id
, time
, status
, and covariates# Load package containing data
library(WR)
df <- gbc # n = 686 subjects, p = 9 covariates
df # status = 0 (censored), 1 (death), 2 (recurrence)
#> id time status hormone age menopause size grade ...
#>1 1 43.836066 2 1 38 1 18 3
#>2 1 74.819672 0 1 38 1 18 3
#>3 2 46.557377 2 1 52 1 20 1
#>4 2 65.770492 0 1 52 1 20 1
#>5 3 41.934426 2 1 47 1 30 2
#>...
wr_split()
functioncv_wrnet(id, time, status, Z, k = 10, ...)
# 10-fold CV -------------------------------------------
set.seed(1234)
obj_cv <- cv_wrnet(df_train$id, df_train$time, df_train$status,
df_train |> select(-c(id, time, status)))
# Plot CV results (C-index vs log-lambda)
obj_cv |>
ggplot(aes(x = log(lambda), y = concordance)) +
geom_point() + geom_line() + theme_minimal()
# Optimal lambda
lambda_opt <- obj_cv$lambda[which.max(obj_cv$concordance)]
lambda_opt
#> [1] 0.0171976
Fit final model
wrnet(id, time, status, Z, lambda = lambda_opt, ...)
# Final model ------------------------------------------
final_fit <- wrnet(df_train$id, df_train$time, df_train$status,
df_train |> select(-c(id, time, status)), lambda = lambda_opt)
final_fit$beta # Estimated coefficients
#> s0
#> hormone 0.306026364
#> age 0.003111462
#> menopause .
#> size -0.007720497
#> grade -0.285511701
#> nodes -0.082227827
#> prog_recp 0.001861367
#> estrg_recp .
test_wrnet(final_fit, df_test)
Check functional form (linear, quadratic, grouped) of \(Z\) (Lin et al., 1993)
\[ \mbox{plot} \sum_{i, j: Z_i - Z_j\leq z}(Z_i - Z_j)M_{ij}(\infty \mid Z_i, Z_j;\hat\beta) \mbox{ against } z\in\mathbb R \]
Currently WR::pwreg()
implements only PW(\(\mathcal W_{\rm P}\))
Variations: regression of win ratio/odds (Follmann et al., 2019; Song et al., 2022)
WR::pwreg(ID, time, status, Z, strata)
wrnet
: https://lmaowisc.github.io/wrnet/glmnet()
backend