Statistical Methods for Composite Endpoints: Win Ratio and Beyond

Chapter 2 - Hypothesis Testing

Lu Mao

Department of Biostatistics & Medical Informatics

University of Wisconsin-Madison

Aug 3, 2024

Outline

  • Win ratio basics and properties
  • Generalize to recurrent events
    • HF-ACTION example (WR package)
  • Sample size calculations
    • HF-ACTION example (WR package) \[\newcommand{\d}{{\rm d}}\] \[\newcommand{\T}{{\rm T}}\] \[\newcommand{\dd}{{\rm d}}\] \[\newcommand{\cc}{{\rm c}}\] \[\newcommand{\pr}{{\rm pr}}\] \[\newcommand{\var}{{\rm var}}\] \[\newcommand{\se}{{\rm se}}\] \[\newcommand{\indep}{\perp \!\!\! \perp}\] \[\newcommand{\Pn}{n^{-1}\sum_{i=1}^n}\] \[ \newcommand\mymathop[1]{\mathop{\operatorname{#1}}} \] \[ \newcommand{\Ut}{{n \choose 2}^{-1}\sum_{i<j}\sum} \] \[ \def\a{{(a)}} \def\b{{(1-a)}} \def\t{{(1)}} \def\c{{(0)}} \def\d{{\rm d}} \def\T{{\rm T}} \def\bs{\boldsymbol} \]

Win Ratio Basics & Properties

Standard Two-Sample

  • Two-sample comparison (Pocock et al., 2012)
    • Data: \(D_i^{(a)}, T_i^{(a)}, C_i^{(a)}\): survival, hospitalization, censoring times on \(i\)th subject in group \(a\) \((i=1,\ldots, N_a; a= 1, 0)\)
    • Pairwise comparisons: \(i\)th in group \(a\) vs \(j\)th in group \(1-a\)
      • Hierarchical composite: Death > hospitalization in \(\left[0, C_i^{(a)}\wedge C_j^{(1-a)}\right]\) \[\begin{align} \hat w^{(a, 1-a)}_{ij}&= \underbrace{I(D_j^{(1-a)}< D_i^{(a)}\wedge C_i^{(a)}\wedge C_j^{(1-a)})}_{\mbox{win on survival}}\\ & + \underbrace{I(\min(D_i^{(a)}, D_j^{(1-a)}) > C_i^{(a)}\wedge C_j^{(1-a)}, T_j^{(1-a)}< T_i^{(a)}\wedge C_i^{(1)}\wedge C_j^{(0)})}_{\mbox{tie on survival, win on hospitalization}} \end{align}\]

Pocock’s Rule

  • Win, lose, or tie?

Calculation of Win Ratio

  • Two-sample statistics
    • Win (loss) fraction for group \(a\) (\(1-a\)) \[ \hat w^{(a, 1-a)}=(N_0N_1)^{-1}\sum_{i=1}^{N_a}\sum_{j=1}^{N_{1-a}}\hat w^{(a, 1-a)}_{ij}\]
    • Win ratio statistic \[ WR = \hat w^{(1, 0)} / \hat w^{(0, 1)} \]
    • Other measures
      • Net benefit (proportion in favor): \(\hat w^{(1, 0)} - \hat w^{(0, 1)}\) (Buyse, 2010)
      • Win odds: \((\hat w^{(1, 0)} - \hat w^{(0, 1)} + 1)/ (\hat w^{(0, 1)} - \hat w^{(1, 0)} + 1)\) (Dong et al., 2020)

The Binary Case

  • Consider binary \(Y^{(a)}= 1, 0\)
    • \(\hat w^{(a, 1-a)}_{ij} = I(Y_i^{(a)}> Y_j^{(1-a)})=Y_i^{(a)}(1-Y_j^{(1-a)})\)
    • Win (loss) fraction \[ \hat w^{(a, 1-a)} = (N_1N_0)^{-1}\sum_{i=1}^{N_a}\sum_{j=1}^{N_{1-a}}Y_i^{(a)}(1-Y_j^{(1-a)}) = \hat p^{(a)}(1-\hat p^{(1-a)})\]
      • \(\hat p^{(a)}= N_a^{-1}\sum_{i=1}^{N_a} Y_i^{(a)}\) (success probability)
    • Equivalencies \[\begin{align} {\rm Win\,\, ratio}&= \frac{\hat w^{(1, 0)}}{\hat w^{(0, 1)}} = \frac{\hat p^{(1)}(1-\hat p^{(0)})}{\hat p^{(0)}(1-\hat p^{(1)})} = {\rm Odds \,\, ratio}\\ {\rm Net \,\, benefit}&=\hat w^{(1, 0)} - \hat w^{(0, 1)} = \hat p^{(1)}- \hat p^{(0)}= {\rm Risk \,\, difference} \end{align}\]

Hypothesis Testing

  • Test statistic
    • Log-transformed and normalized \[ S_n = \frac{n^{1/2}\log(\hat w_{1,0}/\hat w_{0,1})}{\hat{\rm SE}} \stackrel{H_0}{\sim} N (0, 1) \]
    • Null hypothesis \[ H_0: H^\t(s, t) = H^\c(s, t)\mbox{ for all } t\leq s \]
      • \(H^\a(s, t)=\pr(D^\a > s, T^\a > t)\)

Alternative Hypothesis

  • What is estimand of WR?
    • Censoring-weighted average of time-dependent WRs (Oakes, 2016) \[ \frac{\hat w_{1,0}}{\hat w_{0,1}}\to \frac{\int_0^\infty\pr(\mbox{Treatment wins by } t)\dd G(t)} {\int_0^\infty\pr(\mbox{Control wins by } t)\dd G(t)} \]
      • \(G(t)\): Distribution function of \(C^\t\wedge C^\c\)
  • Alternative hypothesis
    • Treatment wins consistently against control over time (Mao, 2019) \[ H_A: \pr(\mbox{Treatment wins by } t)\geq \pr(\mbox{Control wins by } t) \mbox{ for all } t \]
    • Sufficient condition: joint stochastic order of death and nonfatal event \[ H_A: H^\t(s, t) \geq H^\c(s, t)\mbox{ for all } t\leq s \]

Variations

  • Weighting

    • Unweighted pairwise comparisons \(\to\) Gehan (1965) test
    • Weight win/loss by time of follow-up \(\to\) log-rank (more efficient) (Luo et al., 2017)
  • Stratification

Handling Recurrent Events

General Data

  • Full outcomes
    • A subject in group \(a\) \((a=1, 0)\) \[\mathcal H^{*{(a)}}(t)=\left\{N^{*{(a)}}_D(u), N^{*{(a)}}_1(u), \ldots, N^{*{(a)}}_K(u):0\leq u\leq t\right\}\]
    • \(N^{*{(a)}}_D(u), N^{*{(a)}}_1(u), \ldots, N^{*{(a)}}_K(u)\): counting processes for death and \(K\) different types of nonfatal events
  • Observed data
    • \(\mathcal H^{*{(a)}}(X^{(a)})\): life history up to \(X^{(a)}= D^{(a)}\wedge C^{(a)}\)

General Rule of Comparison

  • Win function
    • Time frame of comparison: \([0, t]\) \[\mathcal W(\mathcal H^{*{(a)}}, \mathcal H^{*{(1-a)}})(t) =I\left\{\mathcal H^{*{(a)}}(t) \mbox{ is more favorable than } \mathcal H^{*{(1-a)}}(t)\right\}\]

    • Basic requirements

      • (W1) \(\mathcal W(\mathcal H^{*{(a)}}, \mathcal H^{*{(1-a)}})(t)\) is a function only of \(\mathcal H^{*{(a)}}(t)\) and \(\mathcal H^{*{(1-a)}}(t)\)
      • (W2) \(\mathcal W(\mathcal H^{*{(a)}}, \mathcal H^{*{(1-a)}})(t)+\mathcal W(\mathcal H^{*{(1-a)}}, \mathcal H^{*{(a)}})(t) \in \{0, 1\}\)
      • (W3) \(\mathcal W(\mathcal H^{*{(a)}}, \mathcal H^{*{(1-a)}})(t)=\mathcal W(\mathcal H^{*{(a)}}, \mathcal H^{*{(1-a)}})(D^{(a)}\wedge D^{(1-a)}\wedge t)\)
    • Interpretations

      • (W1) Consistency of time frame
      • (W2) Either win, loss, or tie
      • (W3) No change of win-loss status after death (satisfied if death is prioritized)

Generalized Win Ratio

  • Under general win function \(\mathcal W(\cdot,\cdot)\)
    • Win ratio statistic \[\begin{equation}\label{eq:wr:gen_WR} \hat{\mathcal E}_n(\mathcal W)=\frac{(N_1N_0)^{-1}\sum_{i=1}^{N_1}\sum_{j=1}^{N_0}\mathcal W(\mathcal H^{*{(1)}}_{i}, \mathcal H^{*{(0)}}_{j})(X^{{(1)}}_{i}\wedge X^{{(0)}}_{j})} {(N_1N_0)^{-1}\sum_{i=1}^{N_1}\sum_{j=1}^{N_0}\mathcal W(\mathcal H^{*{(0)}}_{j}, \mathcal H^{*{(1)}}_{i})(X^{{(1)}}_{i}\wedge X^{{(0)}}_{j})} \end{equation}\]
    • Still each pair is compared over \(\left[0, X^{{(1)}}_{i}\wedge X^{{(0)}}_{j}\right]\), but by a general rule \(\mathcal W\)
    • Stratified win ratio: ratio between weighted sum of within-stratum win/loss fractions

Examples

  • Pocock’s WR
    • \(T^{(a)}_1\): time of first event in \(N^{*{(a)}}(t)=\sum_{k=1}^K N^{*{(a)}}_k(t)\) \[\begin{align}\label{eq:wr:PWR} \mathcal W_{\rm P}(\mathcal H^{*{(a)}}, \mathcal H^{*{(1-a)}})(t)&=I\{D^{(1-a)}<D^{(a)}\wedge t\}\notag\\ &\hspace{2mm}+I\{D^{(a)}\wedge D^{(1-a)}>t, T_{1}^{(1-a)}<T_{1}^{(a)}\wedge t\} \end{align}\]
    • \(\hat{\mathcal E}_n(\mathcal W_{\rm P})\)
  • TFE WR
    • \(\tilde T^{(a)}=\min(D^{(a)}, T_1^{(a)})\) \[ \mathcal W_{\rm TFE}(\mathcal H^{*{(a)}},\mathcal H^{*{(1-a)}})(t)=I(\tilde T^{(1-a)}<\tilde T^{(a)}\wedge t) \]
    • \(\hat{\mathcal E}_n(\mathcal W_{\rm TFE})\): allowable but not desirable

Options for Recurrent Events

  • Three variations (Mao et al., 2022)
    • Naive: Death > number of events (Finkelstein & Schoenfeld, 1999)
    • First-event: Death > number of events > time to first event
    • Last-event: Death > number of events > time to last event
  • Properties
    • First/Last-event fewer ties than standard WR

    • First/Last-event \(\to\) Pocock’s WR with nonrecurrent event

    • Last-event: longer-term endpoint (recommended)

      Exercise

      Write out the win function \(\mathcal W\) for the three versions of recurrent-event WR.

Comparison with Pocock’s

  • Last-event WR (LWR)
    • vs Pocock’s WR (PWR)

Alternative Hypothesis for LWR

  • LWR
    • Tests joint stochastic order of all events \[ H_A: H^\t(s, t_1, t_2, \ldots) \geq H^\c(s, t_1, t_2, \ldots)\mbox{ for all } t_1\leq t_2\leq\cdots\leq s \]
      • \(H^\t(s, t_1, t_2, \ldots)=\pr(D^\a > s, T_1^\a > t_1, T_2^\a > t_2, \ldots)\)
      • \(T_k^\a\): \(k\)th recurrent event in \(N^{*{(a)}}(t)\) \((k=1, 2, \ldots)\)
    • Treatment stochastically delays all events
    • All variations of WR implemented in WR package

Software: WR::WRrec()

  • Basic syntax
    • Long format ID: unique patient identifier; time: event times; status: event types (1: death; 2: recurrent events; 0: censoring); trt: binary treatment; strata: strata variable
    • naive = TRUE: calculates naive/FWR as well as LWR
library(WR)
obj <- WRrec(ID, time, status, trt, strata = NULL, 
              naive = FALSE)
  • Output: a list of class WRrec
    • obj$log.WR: log-LWR; obj$se: \(\hat\se(\mbox{log-LWR})\)
    • print(obj) to print summary results

HF-ACTION: Data

  • High-risk subset \((n=426)\)

    • age60: indicator of age \(\geq\) 60 yrs
    library(WR)
    ##### Read in HF-ACTION DATA########
    # same as rmt::hfaction used in chap 1 
    #  (except for status coding)
    data(hfaction_cpx9)
    hfaction <- hfaction_cpx9
    head(hfaction)
    #        patid       time status trt_ab age60
    #> 1 HFACT00001  7.2459016      2      0     1
    #> 2 HFACT00001 12.5573770      0      0     1
    #> 3 HFACT00002  0.7540984      2      0     1
    #> 4 HFACT00002  4.2950820      2      0     1
    #> 5 HFACT00002  4.7540984      2      0     1
    #> 6 HFACT00002 45.9016393      0      0     1

HF-ACTION: Summary

  • Descriptive
Table 1: Summary statistics for a high-risk subgroup (n=426) in HF-ACTION trial.
Usual care (N = 221) Exercise training (N = 205)
Age ≤ 60 years 122 (55.2%) 128 (62.4%)
> 60 years 99 (44.8%) 77 (37.6%)
Follow-up (months) 28.6 (18.4, 39.3) 27.6 (19, 40.2)
Death 57 (25.8%) 36 (17.6%)
Hospitalizations 0 51 (23.1%) 60 (29.3%)
1-3 114 (51.6%) 102 (49.8%)
4-10 49 (22.2%) 39 (19%)
>10 7 (3.2%) 4 (2%)

HF-ACTION: WR Analyses

  • Naive (NWR), first-event (FWR), LWR

    • Stratified by age \(<\) or \(\geq 60\)
    obj <- WRrec(ID = hfaction$patid, time = hfaction$time, 
                 status = hfaction$status, trt = hfaction$trt_ab,
                 strata = hfaction$age60, naive = TRUE)
    obj
    #>             N Rec. Event Death Med. Follow-up
    #> Control   221        571    57       28.62295
    #> Treatment 205        451    36       27.57377
    #> 
    #> WR analyses:
    #>     Win prob Loss prob WR (95% CI)*      p-value
    #> LWR 50.4%    38.2%     1.32 (1.05, 1.66) 0.0189 
    #> FWR 50.4%    38.3%     1.32 (1.04, 1.66) 0.0202 
    #> NWR 47%      35%       1.34 (1.05, 1.72) 0.0193 
    #> -----
    #> *Note: The scale of WR depends on censoring distribution.

HF-ACTION: Overall

  • Recurrent-event WRs more powerful than PWR
    • NWR/FWR/LWR similar as \(N\) hosp is highly variable (0 - 26)

Sample Size Calculations

Special Case: PWR

  • Simplified outcome model
    • Gumbel-Hougaard copula (Oakes, 1989) \[ \pr(D^\a>s, T_1^\a>t) = \exp\left(-\left[\{\exp(a\xi_D)\lambda_Ds\}^\kappa + \{\exp(a\xi_H)\lambda_Ht\}^\kappa \right]^{1/\kappa}\right) \]
    • Parameters
      • \(\lambda_D, \lambda_H\): baseline hazard rates for death/nonfatal event
      • \(\exp(\xi_D), \exp(\xi_H)\): treatment HR on death/nonfatal event (effect sizes)
      • \(\kappa\geq 1\): association parameter (Kendall’s rank correlation \(1-\kappa^{-1}\))
  • Study design
    • Uniform patient accrual over \([0, \tau_b]\); follow all until \(\tau>\tau_b\)
    • Random loss-to-follow-up (LTFU) rate \(\lambda_L\)

Sample Size Formula

  • Total sample size needed \[ n = \frac{\zeta_0^2(\lambda_D,\lambda_H,\kappa,\tau_c,\tau,\lambda_L)(z_{1-\alpha/2} + z_\gamma)^2} {q(1-q)\delta(\lambda_D,\lambda_H,\kappa,\tau_c,\tau,\lambda_L)^\T\xi} \]
    • \(\alpha =0.05\): type I error; \(\gamma = 0.8, 0.9\): desired power (\(z_\gamma=\Phi^{-1}(\gamma)\))
    • \(\xi=(\xi_D,\xi_H)^\T\): component-wise log-HRs (effect sizes)
    • \(q=N_1/n\): proportion assigned to treatment
    • Nuisance parameters
      • \(\zeta_0(\lambda_D,\lambda_H,\kappa,\tau_c,\tau,\lambda_L)\): individual-level noise parameter (cf. SD) in WR
      • \(\delta(\lambda_D,\lambda_H,\kappa,\tau_c,\tau,\lambda_L)\): differential vector for log-WR \(\to\) log-HRs
      • Calculable by WR::base(lambda_D,lambda_H,kappa,tau_c,tau,lambda_L)

Parameter Specification

  • Baseline outcome parameters \((\lambda_D,\lambda_H,\kappa)\)

    • Estimable from pilot/historical data
      • WR::gumbel.est(id, time, status)

    Exercise: Under Gumbel-Hougaard copula

    • \(D^\c\sim\mbox{exponential}(\lambda_D)\)

    • \(\tilde T^\c = D^\c\wedge T_1^\c\sim\mbox{exponential}\left(\lambda_{CE}\right)\), where \(\lambda_{CE} = (\lambda_D^\kappa + \lambda_H^\kappa)^{1/\kappa}\)

    • Cause-specific hazard for \(T_1^\c\): \(\lambda_H^\#=\lambda_H^\kappa\lambda_{CE}^{1-\kappa}\)

      Three parameters \(\to\) three estimable quantities

  • Design parameters \((\tau_c,\tau,\lambda_L)\)

    • Self-specify

Software: WR::WRSS()

  • Basic steps

    • xi: log-HRs \(\xi=(\xi_D, \xi_H)^T\) (e.g., \(\log (0.8, 0.9)^\T\))
    # Step 1: estimate (lambda_D, lambda_H, kappa) from pilot data
    outcome_base <- gumbel.est(id, time, status)
    lambda_D <- outcome_base$lambda_D
    lambda_H <- outcome_base$lambda_H
    kappa <- obj_base$kappa
    # Step 2: calculate zeta2 and delta from
    # (lambda_D, lambda_H, kappa, tau_b, tau, lambda_L)
    bparam <- base(lambda_D,lambda_H,kappa,tau_c,tau,lambda_L)
    ## a list of zeta2 and delta 
    # Step 3: calculate sample size using bparam
    obj <- WRSS(xi, bparam, q = 0.5, alpha = 0.05, side = 2, power = 0.8)
    obj$n

A New Training Trial

  • Background
    • WR demonstrated beneficial effect of training on (death > hosp) in HF patients with CPX \(\leq 9\) min
  • Design of new trial
    • Purpose: test a new training program with existing one as standard care
    • Design: \(\tau_b = 3\) yrs patient accrual, follow until \(\tau = 4\) yrs
      • Assume minimal LTFU \(\lambda_L = 0.01\) per person-year
    • Baseline event rates/correlation: estimable from \(n=205\) patients in HF-ACTION training arm

HF-ACTION: Historical Data

  • Extract data from hfaction
# get training arm data
pilot <- hfaction |> 
  filter(trt_ab == 1)
head(pilot)
#>     patid      time status trt_ab age60
#> HFACT00007  3.47541      2      1     1
#> HFACT00007 21.60656      2      1     1
#> HFACT00007 29.04918      2      1     1
#> HFACT00007 32.16393      2      1     1
#> HFACT00007 34.88525      1      1     1
#> HFACT00035 48.88525      0      1     1
# number of subjects
pilot |> distinct(patid) |> 
  count()
#>   n
#> 205

HF-ACTION: Baseline Outcome

  • Parameter estimates
    • \(\lambda_D=0.07\) year\(^{-1}\), \(\lambda_H=0.56\) year\(^{-1}\), Kendall’s corr \(=36.1\%\)
# Step 1: estimate (lambda_D, lambda_H, kappa) from HF-ACTION data
outcome_base <- gumbel.est(pilot$patid, pilot$time / 12, pilot$status)
lambda_D <- outcome_base$lambda_D
lambda_H <- outcome_base$lambda_H
kappa <- outcome_base$kappa

lambda_D
#> [1] 0.07307293
lambda_H
#> 1] 0.5596186
kappa
#> [1] 1.564485
## Kendall's rank correlation
1 - 1/kappa
#> [1] 0.360812

Sample Size: Example

  • One scenario
    • HRs on death & hospitalization: 0.9, 0.8
    • Sample size needed for power 80%: \(n=1241\)
# set design parameters
tau_b <- 3
tau <- 4
lambda_L <- 0.001
# Step 2: use base() function to compute zeta2 and delta
set.seed(1234) # Monte-Carlo integration in base()
bparam <- base(lambda_D, lambda_H, kappa, tau_b, tau, lambda_L)
# Step 3: compute sample size under HRs 0.8 and 0.9
obj <- WRSS(xi = log(c(0.9, 0.8)), bparam = bparam, q =  0.5, alpha = 0.05,
          power = 0.8)
obj$n
#> [1] 1240.958

A Range of Effect Sizes

  • Different HRs: \(\exp(\xi)\in [0.6, 0.95]^{\otimes 2}\)

Conclusion

Notes

Summary

  • Win ratio test
    • Standard: death > one nonfatal event
    • Recurrent events: death > frequency > time to last/first event
      • WR::WRrec(ID, time, status, trt, strata)
  • Sample size calculations
    • Gumbel-Hougaard copula for death & nonfatal event \[\pr(D^\a>s, T_1^\a>t) = \exp\left(-\left[\{\exp(a\xi_D)\lambda_Ds\}^\kappa + \{\exp(a\xi_H)\lambda_Ht\}^\kappa \right]^{1/\kappa}\right) \]
      • Step 1: estimate \((\lambda_D,\lambda_H,\kappa)\) WR::gumbel.est(id, time, status)
      • Step 2: calculate \(\zeta^2_0(\lambda_D,\lambda_H,\kappa,\tau_b,\tau,\lambda_L)\) and \(\delta(\lambda_D,\lambda_H,\kappa,\tau_b,\tau,\lambda_L)\) WR::base()
      • Step 3: \(n=\frac{\zeta_0^2(\lambda_D,\lambda_H,\kappa,\tau_c,\tau,\lambda_L)(z_{1-\alpha/2} + z_\gamma)^2}{q(1-q)\delta(\lambda_D,\lambda_H,\kappa,\tau_c,\tau,\lambda_L)^\T\xi}\) WR::WRSS()

References

Bebu, I., & Lachin, J. M. (2016). Large sample inference for a win ratio analysis of a composite outcome based on prioritized components. Biostatistics, 17(1), 178–187. https://doi.org/10.1093/biostatistics/kxv032
Buyse, M. (2010). Generalized pairwise comparisons of prioritized outcomes in the two-sample problem. Statistics in Medicine, 29(30), 3245–3257. https://doi.org/10.1002/sim.3923
Dong, G., Hoaglin, D. C., Huang, B., Cui, Y., Wang, D., Cheng, Y., & Gamalo-Siebers, M. (2023). The stratified win statistics (win ratio, win odds, and net benefit). Pharmaceutical Statistics, 22(4), 748–756. https://doi.org/10.1002/pst.2293
Dong, G., Hoaglin, D. C., Qiu, J., Matsouaka, R. A., Chang, Y.-W., Wang, J., & Vandemeulebroecke, M. (2020). The Win Ratio: On Interpretation and Handling of Ties. Statistics in Biopharmaceutical Research, 12(1), 99–106. https://doi.org/10.1080/19466315.2019.1575279
Dong, G., Li, D., Ballerstedt, S., & Vandemeulebroecke, M. (2016). A generalized analytic solution to the win ratio to analyze a composite endpoint considering the clinical importance order among components. Pharmaceutical Statistics, 15(5), 430–437. https://doi.org/10.1002/pst.1763
Dong, G., Qiu, J., Wang, D., & Vandemeulebroecke, M. (2017). The stratified win ratio. Journal of Biopharmaceutical Statistics, 28(4), 778–796. https://doi.org/10.1080/10543406.2017.1397007
Finkelstein, D. M., & Schoenfeld, D. A. (1999). Combining mortality and longitudinal measures in clinical trials. Statistics in Medicine, 18(11), 1341–1354. https://doi.org/10.1002/(sici)1097-0258(19990615)18:11<1341::aid-sim129>3.0.co;2-7
Gasparyan, S. B., Folkvaljon, F., Bengtsson, O., Buenconsejo, J., & Koch, G. G. (2020). Adjusted win ratio with stratification: Calculation methods and interpretation. Statistical Methods in Medical Research, 30(2), 580–611. https://doi.org/10.1177/0962280220942558
Gasparyan, S. B., Kowalewski, E. K., Folkvaljon, F., Bengtsson, O., Buenconsejo, J., Adler, J., & Koch, G. G. (2021). Power and sample size calculation for the win odds test: application to an ordinal endpoint in COVID-19 trials. Journal of Biopharmaceutical Statistics, 31(6), 765–787. https://doi.org/10.1080/10543406.2021.1968893
Gehan, E. A. (1965). A generalized Wilcoxon test for comparing arbitrarily singly-censored samples. Biometrika, 52(1-2), 203–224. https://doi.org/10.1093/biomet/52.1-2.203
Luo, X., Qiu, J., Bai, S., & Tian, H. (2017). Weighted win loss approach for analyzing prioritized outcomes. Statistics in Medicine, 36(15), 2452–2465. https://doi.org/10.1002/sim.7284
Luo, X., Tian, H., Mohanty, S., & Tsai, W. Y. (2015). An Alternative Approach to Confidence Interval Estimation for the Win Ratio Statistic. Biometrics, 71(1), 139–145. https://doi.org/10.1111/biom.12225
Mao, L. (2019). On the Alternative Hypotheses for the Win Ratio. Biometrics, 75(1), 347–351. https://doi.org/10.1111/biom.12954
Mao, L., Kim, K., & Li, Y. (2022). On recurrent-event win ratio. Statistical Methods in Medical Research, 31(6), 1120–1134. https://doi.org/10.1177/09622802221084134
Oakes, D. (1989). Bivariate Survival Models Induced by Frailties. Journal of the American Statistical Association, 84(406), 487–493. https://doi.org/10.1080/01621459.1989.10478795
Oakes, D. (2016). On the win-ratio statistic in clinical trials with multiple types of event. Biometrika, 103(3), 742–745. https://doi.org/10.1093/biomet/asw026
Pocock, S. J., Ariti, C. A., Collier, T. J., & Wang, D. (2012). The win ratio: a new approach to the analysis of composite endpoints in clinical trials based on clinical priorities. European Heart Journal, 33(2), 176–182. https://doi.org/10.1093/eurheartj/ehr352
Seifu, Y., Mt-Isa, S., Duke, K., Gamalo-Siebers, M., Wang, W., Dong, G., & Kolassa, J. (2022). Design of paediatric trials with benefit-risk endpoints using a composite score of adverse events of interest (AEI) and win-statistics. Journal of Biopharmaceutical Statistics, 33(6), 696–707. https://doi.org/10.1080/10543406.2022.2153202
Wang, B., Zhou, D., Zhang, J., Kim, Y., Chen, L.-W., Dunnmon, P., Bai, S., Liu, Q., & Ishida, E. (2023). Statistical power considerations in the use of win ratio in cardiovascular outcome trials. Contemporary Clinical Trials, 124, 107040. https://doi.org/10.1016/j.cct.2022.107040
Yang, S., & Troendle, J. (2020). Event-specific win ratios and testing with terminal and non-terminal events. Clinical Trials, 18(2), 180–187. https://doi.org/10.1177/1740774520972408
Yang, S., Troendle, J., Pak, D., & Leifer, E. (2022). Event-specific win ratios for inference with terminal and non-terminal events. Statistics in Medicine, 41(7), 1225–1241. https://doi.org/10.1002/sim.9266
Yu, R. X., & Ganju, J. (2022). Sample size formula for a win ratio endpoint. Statistics in Medicine, 41(6), 950–963. https://doi.org/10.1002/sim.9297
Zhou, T. J., LaValley, M. P., Nelson, K. P., Cabral, H. J., & Massaro, J. M. (2022). Calculating power for the Finkelstein and Schoenfeld test statistic for a composite endpoint with two components. Statistics in Medicine, 41(17), 3321–3335. https://doi.org/10.1002/sim.9419