Statistical Methods for Composite Endpoints: Win Ratio and Beyond

Chapter 1. Introduction

Lu Mao

Department of Biostatistics & Medical Informatics

University of Wisconsin-Madison

Aug 3, 2024

Outline

  • Examples and regulatory guidelines
  • Traditional methods
    • Time to first event
    • Weighted total events (Wcompo package)
  • Win ratio and hierarchical endpoints
    • The estimand issue

\[\newcommand{\d}{{\rm d}}\] \[\newcommand{\T}{{\rm T}}\] \[\newcommand{\dd}{{\rm d}}\] \[\newcommand{\cc}{{\rm c}}\] \[\newcommand{\pr}{{\rm pr}}\] \[\newcommand{\var}{{\rm var}}\] \[\newcommand{\se}{{\rm se}}\] \[\newcommand{\indep}{\perp \!\!\! \perp}\] \[\newcommand{\Pn}{n^{-1}\sum_{i=1}^n}\] \[ \newcommand\mymathop[1]{\mathop{\operatorname{#1}}} \] \[ \newcommand{\Ut}{{n \choose 2}^{-1}\sum_{i<j}\sum} \] \[ \def\a{{(a)}} \def\b{{(1-a)}} \def\t{{(1)}} \def\c{{(0)}} \def\d{{\rm d}} \def\T{{\rm T}} \]

Example and Guidelines

Motivating Example: Colon Cancer

  • Landmark colon cancer trial
    • Population: 619 patients with stage C disease (Moertel et al., 1990)
    • Arms: Levamisole + fluorouracil (\(n=304\)) vs control (\(n=315\))
    • Endpoint: relapse-free survival (log-rank test p<0.001)
      • Death = relapse
      • 258 deaths (89%) after relapse ignored

Motivating Example: HF-ACTION

  • A cardiovascular trial (HF-ACTION)
    • Subpopulation: 426 heart failure patients (O’Connor et al., 2009)
    • Arms: Exercise training + usual care (\(n=205\)) vs usual care (\(n=221\))
    • Endpoint: hospitalization-free survival (log-rank test p=0.100)
      • Death = hospitalization
      • 82 (88%) deaths + 707 (69%) recurrent hospitalizations ignored

Composite Endpoints

  • Traditional composite endpoint (TCE)
    • Time to first event
      • Relapse/Progression-free survival
      • First major adverse cardiac event (MACE): death, heart failure, myocardio-infarction, stroke (event-free survival)
    • Limitations
      • Lack of clinical priority
      • Statistical inefficiency (waste of data)
  • Hierarchical composite endpoint (HCE)
    • Example: Death > nonfatal MACE > six-minute walk test (6MWT)/NYHA class

Why Composite

  • Advantages

    • More events \(\to\) higher power \(\to\) smaller sample size/lower costs

    • No need for multiplicity adjustment

    • A unified measure of treatment effect

    ICH-E9 “Statistical Principles for Clinical Trials” (ICH, 1998)

    • “There should generally be only one primary variable”
    • “If a single primary variable cannot be selected …, another useful strategy is to integrate or combine the multiple measurements into a single or composite variable …”
    • “[composite endpoint] addresses the multiplicity problem without adjustment to the type I error”

Regulatory Guidelines: FDA

  • Main points

    • Typically first event but can do total events
    • Component-wise analysis important for interpretation

    FDA Guidance for Industry: “Multiple Endpoints in Clinical Trials” (FDA, 2022)

    • “Composite endpoints are often assessed as the time to first occurrence of any one of the components, …, it also may be possible to analyze total endpoint events”
    • “The treatment effect on the composite rate can be interpreted as characterizing the overall clinical effect when the individual events all have reasonably similar clinical importance”
    • “…analyses of the components of the composite endpoint are important and can influence interpretation of the overall study results”

Regulatory Guidelines: Europe

  • Main points

    • Combine events of similar importance
    • Include mortality as a component

    European Network for Health Technology Assessment “Endpoints used for Relative Effectiveness Assessment – Composite Endpoints” (EUnetHTA, 2015)

    • “All components of a composite endpoint should be separately defined as secondary endpoints and reported with the results of the primary analysis”
    • “Components of similar clinical importance and sensitivity to intervention should preferably be combined”
    • “If adequate, mortality should however be included if it is likely to have a censoring effect on the observation of other components”

A Tricky Example

  • The EMPA-REG Trial (NCT01131676)
    • Population: 7,020 patients with type 2 diabetes (Zinman et al., 2015)
    • Treatment arms: Empagliflozin vs control
    • Endpoint: Time to first CV death, nonfatal MI, nonfatal stroke

Traditional Composites

Data and Notation

  • Full data \(\mathcal H^*(\infty)\)
    • \(D\): survival time; \(N^*_D(t)=I(D\leq t)\)
    • \(N^*_1(t), \ldots, N^*_K(t)\): counting processes for \(K\) nonfatal event types
    • Cumulative data: \(\mathcal H^*(t)=\{N^*_D(u), N^*_1(u), \ldots, N^*_K(u):0\leq u\leq t\}\)
  • Observed (censored) data \(\{\mathcal H^*(X), X\}\)
    • \(\mathcal H^*(X)\): outcomes up to time \(X\)
    • \(X=D\wedge C\): length of follow-up (\(a\wedge b = \min(a, b)\))
    • \(C\): independent censoring time
    • Goal: estimate/test features of \(\mathcal H^*(\infty)\) using \(\{\mathcal H^*(X), X\}\)

First Event

  • Univariate endpoint
    • \(N^*_{\rm TFE}(t) = I\{N^*_D(t)+\sum_{k=1}^KN^*_k(t)\geq 1\}\)

      • \(I(\cdot)\): 0-1 indicator
    • \(\tilde T\): time to first event

      • Kaplan–Meier curve, log-rank test, Cox model
  • Component-wise weighting
    • Upweight death over nonfatal events
      • E.g., Death = 2 \(\times\) hospitalization

Total Events

  • Weighted composite event process
    • \(N^*_{\rm R}(t)=w_DN^*_D(t)+\sum_{k=1}^Kw_kN^*_k(t)\)
      • \(w_D, w_1, \ldots, w_K\): weights to death and nonfatal events
    • Proportional means model (Mao & Lin, 2016) \[ E\{N^*_{\rm R}(t)\mid Z\} = \exp(\beta^\T Z)\mu_0(t) \]
      • \(\exp(\beta)\): mean ratio of weighted total events comparing treatment \((Z=1)\) vs control \((Z=0)\)
    • R-package: Wcompo

Software: Wcompo::CompoML()

  • Basic syntax
    • id: unique patient identifier; time: event times; status: event types (1: death; 2,...,K nonfatal event types; Z: covariate matrix)
    • w: \(K\)-vector of weights to event types 1 (death), 2,...,K (nonfatal events); default is unweighted
library(Wcompo)
obj <- CompoML(id, time, status, Z, w = c(2, 1))
  • Output: a list of class CompoML
    • obj$beta: \(\hat\beta\); obj$var: \(\hat\var(\hat\beta)\)
    • plot(obj, z): plot mean function \(\exp(\hat\beta^{\rm T} z)\hat\mu_0(t)\)

HF-ACTION: An Example

  • High-risk subgroup (n=426)
    • Baseline cardiopulmonary exercise (CPX) test \(\leq\) 9 min
Table 1: Summary statistics for a high-risk subgroup (n=426) in HF-ACTION trial.
Usual care (N = 221) Exercise training (N = 205)
Age ≤ 60 years 122 (55.2%) 128 (62.4%)
> 60 years 99 (44.8%) 77 (37.6%)
Follow-up (months) 28.6 (18.4, 39.3) 27.6 (19, 40.2)
Death 57 (25.8%) 36 (17.6%)
Hospitalizations 0 51 (23.1%) 60 (29.3%)
1-3 114 (51.6%) 102 (49.8%)
4-10 49 (22.2%) 39 (19%)
>10 7 (3.2%) 4 (2%)

HF-ACTION: Preparation

  • Load packages and data
library(survival)
# install.packages("Wcompo")
library(Wcompo) # for weighted total events
library(rmt) # for hfaction data
library(tidyverse) # for data wrangling
# load data
data(hfaction)
head(hfaction) # trt_ab=1: training; 0: usual care
#>        patid       time status trt_ab age60
#> 1 HFACT00001 0.60506502      1      0     1
#> 2 HFACT00001 1.04859685      0      0     1
#> 3 HFACT00002 0.06297057      1      0     1
#> 4 HFACT00002 0.35865845      1      0     1
#> 5 HFACT00002 0.39698836      1      0     1
#> 6 HFACT00002 3.83299110      0      0     1

HF-ACTION: Data

  • Data processing
# for weighted total by compoML()
## convert status=1 for death, 2=hospitalization
hfaction <- hfaction |> 
  mutate(
    status = case_when(
      status == 1 ~ 2,
      status == 2 ~ 1,
      status == 0 ~ 0)
  )
# TFE: take the first event per patient id
hfaction_TFE <- hfaction |> 
  arrange(patid, time) |> 
  group_by(patid) |> 
  slice_head() |> 
  ungroup()

HF-ACTION: Mortality

  • Cox model for death
    • HR: \(\exp(-0.3973) = 67.2\%\) (\(32.8\%\) reduction in risk)
    • \(P\)-value: 0.0621 (borderline significant)
## get mortality data
hfaction_D <- hfaction |> 
  filter(status != 2) # remove hospitalization records

## Cox model for death against trt_ab
obj_D <- coxph(Surv(time, status) ~ trt_ab, data = hfaction_D)
summary(obj_D)
#> n= 426, number of events= 93 
#>           coef exp(coef) se(coef)      z      p
#> trt_ab -0.3973    0.6721   0.2129 -1.866 0.0621

HF-ACTION: TFE

  • Cox model for hospitalization-free survival
    • HR: \(\exp(-0.1770) = 83.8\%\) (\(16.2\%\) reduction in risk)
    • \(P\)-value: 0.111 (less significant than death)
# Cox model for TFE against trt_ab
obj_TFE <- coxph(Surv(time, status > 0) ~ trt_ab, data = hfaction_TFE)
summary(obj_TFE)
#>   n= 426, number of events= 326 
#>           coef exp(coef) se(coef)      z Pr(>|z|)
#> trt_ab -0.1770    0.8378   0.1112 -1.592    0.111

HF-ACTION: Death vs TFE

  • Hospitalizations dilute effect on death …
    • An EMPA-REG-like situation

HF-ACTION: Weighted Total

  • Proportional means model (death = \(2\times\) hosp)

    • MR: \(\exp(-0.15398) = 85.7\%\) (\(14.3\%\) reduction in total number of composite events)
    • \(P\)-value: 0.170 (less significant than TFE)
    • Limitation: Survival \(\uparrow\) \(\to\) cumulative total \(\uparrow\) \(\to\) attenuated effect
    # Total events (proportional mean) -------------------------------
    obj_ML <- CompoML(hfaction$patid, hfaction$time, hfaction$status, 
                      hfaction$trt_ab, w = c(2, 1))
    obj_ML
    #>         Event 1 (Death) Event 2
    #> Weight               2       1
    #>         Estimate      se z.value p.value
    #> trt_ab -0.15398  0.11215 -1.3729  0.1698

HF-ACTION: Cumulative means

  • Model-based mean functions
plot(obj_ML, 0, ylim= c(0, 5), xlab="Time (years)", col= "red", lwd = 2)
plot(obj_ML, 1, add = TRUE, col = "blue", lwd = 2)
legend(0, 5, col=c("red","blue"), c("Usual care", "Training"), lwd = 2)

Lessons Learned

  • Adding nonfatal events \(\neq\) higher power
  • Solutions
    • Hierarchically prioritize death
      • Evaluate nonfatal components only on survivors
    • Quantitative weighting \(\to\) adjust for survival time
      • Loss rate = cumulative total / length of exposure (Ch 3)

Hierarchical Composites

Win Ratio: Basics

  • A common approach to HCE
    • Proposed and popularized by Pocock et al. (2012)
    • Treatment vs control: generalized pairwise comparisons
    • Win-loss: sequential comparison on components
      • Longer survival > fewer/later nonfatal MACE > better 6MWT/NYHA score
    • Effect size: WR \(=\) wins / losses
  • Alternative metrics

Win Ratio: Gaining Popularity

  • More trials are using it…

An Important Caveat

  • WR’s estimand depends on censoring …

  • What is an estimand?

    • Population-level quantity to be estimated
      • Population-mean difference, (true) risk ratio, etc.
    • Specifies how treatment effect is measured
    • ICH E9 (R1) addendum: estimand construction one of the “central questions for drug development and licensing(ICH, 2020)

Win-Loss Changes with Time

  • Illustration
    • Win-loss status, and deciding component, changes with time
    • Longer follow-up …
      • Parameters: win/loss proportions \(\uparrow\) (WR uncertain); tie proportion \(\downarrow\)
      • Component contributions: prioritized \(\uparrow\); deprioritized \(\downarrow\)

Trial-Dependent Estimand

  • Actual estimand
    • Average WR mixing shorter-term with longer-term comparisons
    • Weight set (haphazardly) by censoring distribution
      • Staggered entry, random withdrawal \(\to\) non-scientific
  • Testing vs estimation
    • Testing (qualitative): okay
      • Valid under \(H_0\), powerful if treatment consistently outperforms control over time
    • Estimation (quantitative): not okay
      • Pre-define restriction time \(\to\) use censoring weight for unbiased estimation (Ch 3)
      • Specify a time-constant WR model (Ch 4)

Conclusion

Notes

Summary

  • Composite endpoints
    • Death + hospitalization/progression/relapse
    • Regulatory recommendation
  • Traditional
    • Time to first: death = nonfatal (survival::coxph())
    • Weighted total: death = \(w_D\times\) nonfatal (Wcompo::compoML())
  • Hierarchical
    • Win ratio, net benifit, win odds: death > nonfatal
    • Estimand issue - ICH E9 (R1)

References

Akacha, M., Bretz, F., Ohlssen, D., Rosenkranz, G., & Schmidli, H. (2017). Estimands and Their Role in Clinical Trials. Statistics in Biopharmaceutical Research, 9(3), 268–271. https://doi.org/10.1080/19466315.2017.1302358
Bebu, I., & Lachin, J. M. (2016). Large sample inference for a win ratio analysis of a composite outcome based on prioritized components. Biostatistics, 17(1), 178–187. https://doi.org/10.1093/biostatistics/kxv032
Brunner, E., Vandemeulebroecke, M., & Mütze, T. (2021). Win odds: An adaptation of the win ratio to include ties. Statistics in Medicine, 40(14), 3367–3384. https://doi.org/10.1002/sim.8967
Buyse, M. (2010). Generalized pairwise comparisons of prioritized outcomes in the two-sample problem. Statistics in Medicine, 29(30), 3245–3257. https://doi.org/10.1002/sim.3923
Deltuvaite-Thomas, V., Verbeeck, J., Burzykowski, T., Buyse, M., Tournigand, C., Molenberghs, G., & Thas, O. (2022). Generalized pairwise comparisons for censored data: An overview. Biometrical Journal, 65(2). https://doi.org/10.1002/bimj.202100354
Dong, G., Hoaglin, D. C., Qiu, J., Matsouaka, R. A., Chang, Y.-W., Wang, J., & Vandemeulebroecke, M. (2020). The Win Ratio: On Interpretation and Handling of Ties. Statistics in Biopharmaceutical Research, 12(1), 99–106. https://doi.org/10.1080/19466315.2019.1575279
Dong, G., Huang, B., Chang, Y.-W., Seifu, Y., Song, J., & Hoaglin, D. C. (2020a). The win ratio: Impact of censoring and follow-up time and use with nonproportional hazards. Pharmaceutical Statistics, 19(3), 168–177. https://doi.org/10.1002/pst.1977
Dong, G., Huang, B., Verbeeck, J., Cui, Y., Song, J., Gamalo-Siebers, M., Wang, D., Hoaglin, D. C., Seifu, Y., Mütze, T., & Kolassa, J. (2022). Win statistics (win ratio, win odds, and net benefit) can complement one another to show the strength of the treatment effect on time-to-event outcomes. Pharmaceutical Statistics, 22(1), 20–33. https://doi.org/10.1002/pst.2251
EUnetHTA. (2015). Guidance for industry: Multiple endpoints in clinical trials. https://www.eunethta.eu/wp-content/uploads/2018/01/Endpoints-used-for-Relative-Effectiveness-Assessment-Composite-endpoints_Amended-JA1-Guideline_Final-Nov-2015_0.pdf
FDA. (2022). Guidance for industry: Multiple endpoints in clinical trials. https://www.fda.gov/regulatory-information/search-fda-guidance-documents/multiple-endpoints-clinical-trials-guidance-industry
Fine, J. P., & Gray, R. J. (1999). A Proportional Hazards Model for the Subdistribution of a Competing Risk. Journal of the American Statistical Association, 94(446), 496–509. https://doi.org/10.1080/01621459.1999.10474144
Freemantle, N., Calvert, M., Wood, J., Eastaugh, J., & Griffin, C. (2003). Composite Outcomes in Randomized Trials. JAMA, 289(19), 2554. https://doi.org/10.1001/jama.289.19.2554
Ghosh, D., & Lin, D. Y. (2000). Nonparametric Analysis of Recurrent Events and Death. Biometrics, 56(2), 554–562. https://doi.org/10.1111/j.0006-341x.2000.00554.x
Gray, R. J. (1988). A class of \(K\)-sample tests for comparing the cumulative incidence of a competing risk. The Annals of Statistics, 16(3). https://doi.org/10.1214/aos/1176350951
ICH. (1998). Statistical principles for clinical trials. London: European Medicines Evaluation Agency.
ICH. (2020). ICH E9 (R1) addendum on estimands and sensitivity analysis in clinical trials to the guideline on statistical principles for clinical trials, step 5. London: European Medicines Evaluation Agency.
Ionan, A. C., Paterniti, M., Mehrotra, D. V., Scott, J., Ratitch, B., Collins, S., Gomatam, S., Nie, L., Rufibach, K., & Bretz, F. (2022). Clinical and Statistical Perspectives on the ICH E9(R1) Estimand Framework Implementation. Statistics in Biopharmaceutical Research, 15(3), 554–559. https://doi.org/10.1080/19466315.2022.2081601
Li, H., Chen, W.-C., Lu, N., Tang, R., & Zhao, Y. (2024). The elusiveness of the win ratio parameter in the presence of missing data. Therapeutic Innovation & Regulatory Science, 1–2.
Luo, X., Tian, H., Mohanty, S., & Tsai, W. Y. (2015). An Alternative Approach to Confidence Interval Estimation for the Win Ratio Statistic. Biometrics, 71(1), 139–145. https://doi.org/10.1111/biom.12225
Mao, L. (2019). On the Alternative Hypotheses for the Win Ratio. Biometrics, 75(1), 347–351. https://doi.org/10.1111/biom.12954
Mao, L. (2024). Defining estimand for the win ratio: separate the true effect from censoring. Clinical Trials, In press.
Mao, L., & Kim, K. (2021). Statistical Models for Composite Endpoints of Death and Nonfatal Events: A Review. Statistics in Biopharmaceutical Research, 13(3), 260–269. https://doi.org/10.1080/19466315.2021.1927824
Mao, L., & Lin, D. Y. (2016). Semiparametric regression for the weighted composite endpoint of recurrent and terminal events. Biostatistics, 17(2), 390–403. https://doi.org/10.1093/biostatistics/kxv050
Moertel, C. G., Fleming, T. R., Macdonald, J. S., Haller, D. G., Laurie, J. A., Goodman, P. J., Ungerleider, J. S., Emerson, W. A., Tormey, D. C., Glick, J. H., Veeder, M. H., & Mailliard, J. A. (1990). Levamisole and Fluorouracil for Adjuvant Therapy of Resected Colon Carcinoma. New England Journal of Medicine, 322(6), 352–358. https://doi.org/10.1056/nejm199002083220602
O’Connor, C. M., Whellan, D. J., Lee, K. L., Keteyian, S. J., Cooper, L. S., Ellis, S. J., Leifer, E. S., Kraus, W. E., Kitzman, D. W., Blumenthal, J. A., Rendall, D. S., Miller, N. H., Fleg, J. L., Schulman, K. A., McKelvie, R. S., Zannad, F., Piña, I. L., & HF-ACTION Investigators, for the. (2009). Efficacy and Safety of Exercise Training in Patients With Chronic Heart Failure. JAMA, 301(14), 1439. https://doi.org/10.1001/jama.2009.454
Oakes, D. (2016). On the win-ratio statistic in clinical trials with multiple types of event. Biometrika, 103(3), 742–745. https://doi.org/10.1093/biomet/asw026
Péron, J., Buyse, M., Ozenne, B., Roche, L., & Roy, P. (2016). An extension of generalized pairwise comparisons for prioritized outcomes in the presence of censoring. Statistical Methods in Medical Research, 27(4), 1230–1239. https://doi.org/10.1177/0962280216658320
Pocock, S. J., Ariti, C. A., Collier, T. J., & Wang, D. (2012). The win ratio: a new approach to the analysis of composite endpoints in clinical trials based on clinical priorities. European Heart Journal, 33(2), 176–182. https://doi.org/10.1093/eurheartj/ehr352
Qu, Y., & Lipkovich, I. (2021). Implementation of ICH E9 (R1): A Few Points Learned During the COVID-19 Pandemic. Therapeutic Innovation & Regulatory Science, 55(5), 984–988. https://doi.org/10.1007/s43441-021-00297-6
Ratitch, B., Bell, J., Mallinckrodt, C., Bartlett, J. W., Goel, N., Molenberghs, G., O’Kelly, M., Singh, P., & Lipkovich, I. (2020). Choosing Estimands in Clinical Trials: Putting the ICH E9(R1) Into Practice. Therapeutic Innovation & Regulatory Science, 54(2), 324–341. https://doi.org/10.1007/s43441-019-00061-x
Redfors, B., Gregson, J., Crowley, A., McAndrew, T., Ben-Yehuda, O., Stone, G. W., & Pocock, S. J. (2020). The win ratio approach for composite endpoints: practical guidance based on previous experience. European Heart Journal, 41(46), 4391–4399. https://doi.org/10.1093/eurheartj/ehaa665
Schmidli, H., Roger, J. H., & Akacha, M. (2023). Rejoinder to Commentaries on Estimands for Recurrent Event Endpoints in the Presence of a Terminal Event. Statistics in Biopharmaceutical Research, 15(2), 255–256. https://doi.org/10.1080/19466315.2023.2166098
Verbeeck, J., De Backer, M., Verwerft, J., Salvaggio, S., Valgimigli, M., Vranckx, P., Buyse, M., & Brunner, E. (2023). Generalized Pairwise Comparisons to Assess Treatment Effects. Journal of the American College of Cardiology, 82(13), 1360–1372. https://doi.org/10.1016/j.jacc.2023.06.047
Zinman, B., Wanner, C., Lachin, J. M., Fitchett, D., Bluhmki, E., Hantel, S., Mattheus, M., Devins, T., Johansen, O. E., Woerle, H. J., Broedl, U. C., & Inzucchi, S. E. (2015). Empagliflozin, Cardiovascular Outcomes, and Mortality in Type 2 Diabetes. New England Journal of Medicine, 373(22), 2117–2128. https://doi.org/10.1056/nejmoa1504720