Statistical Methods for Composite Endpoints: Win Ratio and Beyond

Chapter 2 - Hypothesis Testing

Lu Mao

lmao@biostat.wisc.edu

Department of Biostatistics & Medical Informatics

University of Wisconsin-Madison

May 31, 2025

Outline

Win ratio basics and properties
Generalize to recurrent events
- HF-ACTION example (WR package)
Sample size calculations
- HF-ACTION example (WR package) \[\newcommand{\d}{{\rm d}}\] \[\newcommand{\T}{{\rm T}}\] \[\newcommand{\dd}{{\rm d}}\] \[\newcommand{\cc}{{\rm c}}\] \[\newcommand{\pr}{{\rm pr}}\] \[\newcommand{\var}{{\rm var}}\] \[\newcommand{\se}{{\rm se}}\] \[\newcommand{\indep}{\perp \!\!\! \perp}\] \[\newcommand{\Pn}{n^{-1}\sum_{i=1}^n}\] \[ \newcommand\mymathop[1]{\mathop{\operatorname{#1}}} \] \[ \newcommand{\Ut}{{n \choose 2}^{-1}\sum_{i<j}\sum} \] \[ \def\a{{(a)}} \def\b{{(1-a)}} \def\t{{(1)}} \def\c{{(0)}} \def\d{{\rm d}} \def\T{{\rm T}} \def\bs{\boldsymbol} \]

Win Ratio Basics & Properties

Standard Two-Sample

Two-sample comparison (Pocock et al., 2012)
- Data: $D_i^{(a)}, T_i^{(a)}, C_i^{(a)}$: survival, hospitalization, censoring times on $i$th subject in group $a$ $(i=1,\ldots, N_a; a= 1, 0)$
- Pairwise comparisons: $i$th in group $a$ vs $j$th in group $1-a$
  - Hierarchical composite: Death > hospitalization in $\left[0, C_i^{(a)}\wedge C_j^{(1-a)}\right]$ \[\begin{align} \hat w^{(a, 1-a)}_{ij}&= \underbrace{I(D_j^{(1-a)}< D_i^{(a)}\wedge C_i^{(a)}\wedge C_j^{(1-a)})}_{\mbox{win on survival}}\\ & + \underbrace{I(\min(D_i^{(a)}, D_j^{(1-a)}) > C_i^{(a)}\wedge C_j^{(1-a)}, T_j^{(1-a)}< T_i^{(a)}\wedge C_i^{(1)}\wedge C_j^{(0)})}_{\mbox{tie on survival, win on hospitalization}} \end{align}\]

Pocock’s Rule

Win, lose, or tie?

Calculation of Win Ratio

Two-sample statistics
- Win (loss) fraction for group $a$ ($1-a$) \[ \hat w^{(a, 1-a)}=(N_0N_1)^{-1}\sum_{i=1}^{N_a}\sum_{j=1}^{N_{1-a}}\hat w^{(a, 1-a)}_{ij}\]
- Win ratio statistic \[ WR = \hat w^{(1, 0)} / \hat w^{(0, 1)} \]
- Other measures
  - Net benefit (proportion in favor): $\hat w^{(1, 0)} - \hat w^{(0, 1)}$ (Buyse, 2010)
  - Win odds: $(\hat w^{(1, 0)} - \hat w^{(0, 1)} + 1)/ (\hat w^{(0, 1)} - \hat w^{(1, 0)} + 1)$ (Dong et al., 2020)

The Binary Case

Consider binary $Y^{(a)}= 1, 0$
- $\hat w^{(a, 1-a)}_{ij} = I(Y_i^{(a)}> Y_j^{(1-a)})=Y_i^{(a)}(1-Y_j^{(1-a)})$
- Win (loss) fraction \[ \hat w^{(a, 1-a)} = (N_1N_0)^{-1}\sum_{i=1}^{N_a}\sum_{j=1}^{N_{1-a}}Y_i^{(a)}(1-Y_j^{(1-a)}) = \hat p^{(a)}(1-\hat p^{(1-a)})\]
  - $\hat p^{(a)}= N_a^{-1}\sum_{i=1}^{N_a} Y_i^{(a)}$ (success probability)
- Equivalencies \[\begin{align} {\rm Win\,\, ratio}&= \frac{\hat w^{(1, 0)}}{\hat w^{(0, 1)}} = \frac{\hat p^{(1)}(1-\hat p^{(0)})}{\hat p^{(0)}(1-\hat p^{(1)})} = {\rm Odds \,\, ratio}\\ {\rm Net \,\, benefit}&=\hat w^{(1, 0)} - \hat w^{(0, 1)} = \hat p^{(1)}- \hat p^{(0)}= {\rm Risk \,\, difference} \end{align}\]

Hypothesis Testing

Test statistic
- Log-transformed and normalized \[ S_n = \frac{n^{1/2}\log(\hat w_{1,0}/\hat w_{0,1})}{\hat{\rm SE}} \stackrel{H_0}{\sim} N (0, 1) \]
  - $\hat{\rm SE}$: standard error of numerator by $U$-statistic method (Bebu & Lachin, 2016; Dong et al., 2016; Luo et al., 2015); $n = N_1 + N_0$
- Null hypothesis \[ H_0: H^\t(s, t) = H^\c(s, t)\mbox{ for all } t\leq s \]
  - $H^\a(s, t)=\pr(D^\a > s, T^\a > t)$

Alternative Hypothesis

What is estimand of WR?
- Censoring-weighted average of time-dependent WRs (Oakes, 2016) \[ \frac{\hat w_{1,0}}{\hat w_{0,1}}\to \frac{\int_0^\infty\pr(\mbox{Treatment wins by } t)\dd G(t)} {\int_0^\infty\pr(\mbox{Control wins by } t)\dd G(t)} =\text{Non-centrality parameter} \]
  - $G(t)$: Distribution function of $C^\t\wedge C^\c$
- Treatment wins consistently against control over time (Mao, 2019) \[ H_A: \pr(\mbox{Treatment wins by } t)\geq \pr(\mbox{Control wins by } t) \mbox{ for all } t \]
- Sufficient condition: joint stochastic order of death and nonfatal event \[ H_A: H^\t(s, t) \geq H^\c(s, t)\mbox{ for all } t\leq s \]

Variations

Weighting
- Unweighted pairwise comparisons $\to$ Gehan (1965) test
- Weight win/loss by time of follow-up $\to$ log-rank (more efficient) (Luo et al., 2017)

Stratification
- Stratified WR: within-stratum comparisons (Dong et al., 2017, 2023; Gasparyan et al., 2020)
  - Sum of stratum-specific wins / sum of stratum-specific losses
- Adjust for confounding; increase efficiency

Handling Recurrent Events

General Data

Full outcomes
- A subject in group $a$ $(a=1, 0)$ \[\mathcal H^{*{(a)}}(t)=\left\{N^{*{(a)}}_D(u), N^{*{(a)}}_1(u), \ldots, N^{*{(a)}}_K(u):0\leq u\leq t\right\}\]
- $N^{*{(a)}}_D(u), N^{*{(a)}}_1(u), \ldots, N^{*{(a)}}_K(u)$: counting processes for death and $K$ different types of nonfatal events

Observed data
- $\mathcal H^{*{(a)}}(X^{(a)})$: life history up to $X^{(a)}= D^{(a)}\wedge C^{(a)}$

General Rule of Comparison

Win function
- Time frame of comparison: $[0, t]$ \[\mathcal W(\mathcal H^{*{(a)}}, \mathcal H^{*{(1-a)}})(t) =I\left\{\mathcal H^{*{(a)}}(t) \mbox{ is more favorable than } \mathcal H^{*{(1-a)}}(t)\right\}\]
- Basic requirements
  - (W1) $\mathcal W(\mathcal H^{*{(a)}}, \mathcal H^{*{(1-a)}})(t)$ is a function only of $\mathcal H^{*{(a)}}(t)$ and $\mathcal H^{*{(1-a)}}(t)$
  - (W2) $\mathcal W(\mathcal H^{*{(a)}}, \mathcal H^{*{(1-a)}})(t)+\mathcal W(\mathcal H^{*{(1-a)}}, \mathcal H^{*{(a)}})(t) \in \{0, 1\}$
  - (W3) $\mathcal W(\mathcal H^{*{(a)}}, \mathcal H^{*{(1-a)}})(t)=\mathcal W(\mathcal H^{*{(a)}}, \mathcal H^{*{(1-a)}})(D^{(a)}\wedge D^{(1-a)}\wedge t)$
- Interpretations
  - (W1) Consistency of time frame
  - (W2) Either win, loss, or tie
  - (W3) No change of win-loss status after death (satisfied if death is prioritized)

Generalized Win Ratio

Under general win function $\mathcal W(\cdot,\cdot)$
- Win ratio statistic \[\begin{equation}\label{eq:wr:gen_WR} \hat{\mathcal E}_n(\mathcal W)=\frac{(N_1N_0)^{-1}\sum_{i=1}^{N_1}\sum_{j=1}^{N_0}\mathcal W(\mathcal H^{*{(1)}}_{i}, \mathcal H^{*{(0)}}_{j})(X^{{(1)}}_{i}\wedge X^{{(0)}}_{j})} {(N_1N_0)^{-1}\sum_{i=1}^{N_1}\sum_{j=1}^{N_0}\mathcal W(\mathcal H^{*{(0)}}_{j}, \mathcal H^{*{(1)}}_{i})(X^{{(1)}}_{i}\wedge X^{{(0)}}_{j})} \end{equation}\]
- Still each pair is compared over $\left[0, X^{{(1)}}_{i}\wedge X^{{(0)}}_{j}\right]$, but by a general rule $\mathcal W$
- Stratified win ratio: ratio between weighted sum of within-stratum win/loss fractions

Examples

Pocock’s WR
- $T^{(a)}_1$: time of first event in $N^{*{(a)}}(t)=\sum_{k=1}^K N^{*{(a)}}_k(t)$ \[\begin{align}\label{eq:wr:PWR} \mathcal W_{\rm P}(\mathcal H^{*{(a)}}, \mathcal H^{*{(1-a)}})(t)&=I\{D^{(1-a)}<D^{(a)}\wedge t\}\notag\\ &\hspace{2mm}+I\{D^{(a)}\wedge D^{(1-a)}>t, T_{1}^{(1-a)}<T_{1}^{(a)}\wedge t\} \end{align}\]
- $\hat{\mathcal E}_n(\mathcal W_{\rm P})$

TFE WR
- $\tilde T^{(a)}=\min(D^{(a)}, T_1^{(a)})$ \[ \mathcal W_{\rm TFE}(\mathcal H^{*{(a)}},\mathcal H^{*{(1-a)}})(t)=I(\tilde T^{(1-a)}<\tilde T^{(a)}\wedge t) \]
- $\hat{\mathcal E}_n(\mathcal W_{\rm TFE})$: allowable but not desirable

Options for Recurrent Events

Three variations (Mao et al., 2022)
- Naive: Death > number of events (Finkelstein & Schoenfeld, 1999)
- First-event: Death > number of events > time to first event
- Last-event: Death > number of events > time to last event

Properties
- First/Last-event fewer ties than standard WR
- First/Last-event $\to$ Pocock’s WR with nonrecurrent event
  
  Exercise
  
  Write out the win function $\mathcal W$ for the three versions of recurrent-event WR.

Comparison with Pocock’s

Last-event WR (LWR)
- vs Pocock’s WR (PWR)

Alternative Hypothesis for LWR

LWR
- Tests joint stochastic order of all events \[ H_A: H^\t(s, t_1, t_2, \ldots) \geq H^\c(s, t_1, t_2, \ldots)\mbox{ for all } t_1\leq t_2\leq\cdots\leq s \]
  - $H^\t(s, t_1, t_2, \ldots)=\pr(D^\a > s, T_1^\a > t_1, T_2^\a > t_2, \ldots)$
  - $T_k^\a$: $k$th recurrent event in $N^{*{(a)}}(t)$ $(k=1, 2, \ldots)$
- Treatment stochastically delays all events
- All variations of WR implemented in WR package
  - Simulations show LWR more powerful than rest (Mao et al., 2022)

Software: `WR::WRrec()`

Basic syntax
- Long format ID: unique patient identifier; time: event times; status: event types (1: death; 2: recurrent events; 0: censoring); trt: binary treatment; strata: strata variable
- naive = TRUE: calculates naive/FWR as well as LWR

library(WR)
obj <- WRrec(ID, time, status, trt, strata = NULL, naive = FALSE)

Output: a list of class WRrec
- obj$log.WR: log-LWR; obj$se: $\hat\se(\mbox{log-LWR})$
- print(obj) to print summary results

HF-ACTION: Data

High-risk subset $(n=426)$
- age60: indicator of age $\geq$ 60 yrs

library(WR)
##### Read in HF-ACTION DATA########
# same as rmt::hfaction used in chap 1 
#  (except for status coding)
data(hfaction_cpx9)
hfaction <- hfaction_cpx9
head(hfaction)
#>        patid       time status trt_ab age60
#> 1 HFACT00001  7.2459016      2      0     1
#> 2 HFACT00001 12.5573770      0      0     1
#> 3 HFACT00002  0.7540984      2      0     1
#> 4 HFACT00002  4.2950820      2      0     1
#> 5 HFACT00002  4.7540984      2      0     1
#> 6 HFACT00002 45.9016393      0      0     1

HF-ACTION: Summary

Descriptive

Table 1: Summary statistics for a high-risk subgroup (n=426) in HF-ACTION trial.

		Usual care (N = 221)	Exercise training (N = 205)
Age	≤ 60 years	122 (55.2%)	128 (62.4%)
	> 60 years	99 (44.8%)	77 (37.6%)
Follow-up	(months)	28.6 (18.4, 39.3)	27.6 (19, 40.2)
Death		57 (25.8%)	36 (17.6%)
Hospitalizations	0	51 (23.1%)	60 (29.3%)
	1-3	114 (51.6%)	102 (49.8%)
	4-10	49 (22.2%)	39 (19%)
	>10	7 (3.2%)	4 (2%)

HF-ACTION: WR Analyses

Naive (NWR), first-event (FWR), LWR

Stratified by age $<$ or $\geq 60$

obj <- WRrec(ID = hfaction$patid, time = hfaction$time, 
             status = hfaction$status, trt = hfaction$trt_ab,
             strata = hfaction$age60, naive = TRUE)

obj
#>             N Rec. Event Death Med. Follow-up
#> Control   221        571    57       28.62295
#> Treatment 205        451    36       27.57377
#> 
#> WR analyses:
#>     Win prob Loss prob WR (95% CI)*      p-value
#> LWR 50.4%    38.2%     1.32 (1.05, 1.66) 0.0189 
#> FWR 50.4%    38.3%     1.32 (1.04, 1.66) 0.0202 
#> NWR 47%      35%       1.34 (1.05, 1.72) 0.0193 
#> -----
#> *Note: The scale of WR depends on censoring distribution.

HF-ACTION: Overall

Recurrent-event WRs more powerful than PWR
- NWR/FWR/LWR similar as $N$ hosp is highly variable (0 - 26)

Sample Size Calculations

Special Case: PWR

Simplified outcome model
- Gumbel-Hougaard copula (Oakes, 1989) \[ \pr(D^\a>s, T_1^\a>t) = \exp\left(-\left[\{\exp(a\xi_D)\lambda_Ds\}^\kappa + \{\exp(a\xi_H)\lambda_Ht\}^\kappa \right]^{1/\kappa}\right) \]
  - $\lambda_D, \lambda_H$: baseline hazard rates for death/nonfatal event
  - $\exp(\xi_D), \exp(\xi_H)$: treatment HR on death/nonfatal event (effect sizes)
  - $\kappa\geq 1$: association parameter (Kendall’s rank correlation $1-\kappa^{-1}$)

Study design
- Uniform patient accrual over $[0, \tau_b]$; follow all until $\tau>\tau_b$
- Random loss-to-follow-up (LTFU) rate $\lambda_L$

Sample Size Formula

Total sample size needed \[ n = \frac{\zeta_0^2(\lambda_D,\lambda_H,\kappa,\tau_c,\tau,\lambda_L)(z_{1-\alpha/2} + z_\gamma)^2} {q(1-q)\delta(\lambda_D,\lambda_H,\kappa,\tau_c,\tau,\lambda_L)^\T\xi} \]
- $\alpha =0.05$: type I error; $\gamma = 0.8, 0.9$: desired power ($z_\gamma=\Phi^{-1}(\gamma)$)
- $\xi=(\xi_D,\xi_H)^\T$: component-wise log-HRs (effect sizes)
- $q=N_1/n$: proportion assigned to treatment
- Nuisance parameters
  - $\zeta_0(\lambda_D,\lambda_H,\kappa,\tau_c,\tau,\lambda_L)$: individual-level noise parameter (cf. SD) in WR
  - $\delta(\lambda_D,\lambda_H,\kappa,\tau_c,\tau,\lambda_L)$: differential vector for log-WR $\to$ log-HRs
  - Calculable by WR::base(lambda_D,lambda_H,kappa,tau_c,tau,lambda_L)

Parameter Specification

Baseline outcome parameters $(\lambda_D,\lambda_H,\kappa)$
- Estimable from pilot/historical data
  - WR::gumbel.est(id, time, status)
Exercise: Under Gumbel-Hougaard copula
- $D^\c\sim\mbox{exponential}(\lambda_D)$
- $\tilde T^\c = D^\c\wedge T_1^\c\sim\mbox{exponential}\left(\lambda_{CE}\right)$, where $\lambda_{CE} = (\lambda_D^\kappa + \lambda_H^\kappa)^{1/\kappa}$
- Cause-specific hazard for $T_1^\c$: $\lambda_H^\#=\lambda_H^\kappa\lambda_{CE}^{1-\kappa}$
  
  Three parameters $\to$ three estimable quantities
Design parameters $(\tau_c,\tau,\lambda_L)$
- Self-specify

Software: `WR::WRSS()`

Basic steps

xi: log-HRs $\xi=(\xi_D, \xi_H)^T$ (e.g., $\log (0.8, 0.9)^\T$)

# Step 1: estimate (lambda_D, lambda_H, kappa) from pilot data
outcome_base <- gumbel.est(id, time, status)
lambda_D <- outcome_base$lambda_D
lambda_H <- outcome_base$lambda_H
kappa <- obj_base$kappa

# Step 2: calculate zeta2 and delta from
# (lambda_D, lambda_H, kappa, tau_b, tau, lambda_L)
bparam <- base(lambda_D,lambda_H,kappa,tau_c,tau,lambda_L)
## a list of zeta2 and delta

# Step 3: calculate sample size using bparam
obj <- WRSS(xi, bparam, q = 0.5, alpha = 0.05, side = 2, power = 0.8)
obj$n

A New Training Trial

Background
- WR demonstrated beneficial effect of training on (death > hosp) in HF patients with CPX $\leq 9$ min

Design of new trial
- Purpose: test a new training program with existing one as standard care
- Design: $\tau_b = 3$ yrs patient accrual, follow until $\tau = 4$ yrs
  - Assume minimal LTFU $\lambda_L = 0.01$ per person-year
- Baseline event rates/correlation: estimable from $n=205$ patients in HF-ACTION training arm

HF-ACTION: Historical Data

Extract data from hfaction

# get training arm data
pilot <- hfaction |> 
  filter(trt_ab == 1)
head(pilot)
#>     patid      time status trt_ab age60
#> HFACT00007  3.47541      2      1     1
#> HFACT00007 21.60656      2      1     1
#> HFACT00007 29.04918      2      1     1
#> HFACT00007 32.16393      2      1     1
#> HFACT00007 34.88525      1      1     1
#> HFACT00035 48.88525      0      1     1
# number of subjects
pilot |> distinct(patid) |> 
  count()
#>   n
#> 205

HF-ACTION: Baseline Outcome

Parameter estimates
- $\lambda_D=0.07$ year$^{-1}$, $\lambda_H=0.56$ year$^{-1}$, Kendall’s corr $=36.1\%$

# Step 1: estimate (lambda_D, lambda_H, kappa) from HF-ACTION data
outcome_base <- gumbel.est(pilot$patid, pilot$time / 12, pilot$status)
lambda_D <- outcome_base$lambda_D
lambda_H <- outcome_base$lambda_H
kappa <- outcome_base$kappa
lambda_D
#> [1] 0.07307293
lambda_H
#> 1] 0.5596186
kappa
#> [1] 1.564485
## Kendall's rank correlation
1 - 1/kappa
#> [1] 0.360812

Sample Size: Example

One scenario
- HRs on death & hospitalization: 0.9, 0.8
- Sample size needed for power 80%: $n=1241$

# set design parameters
tau_b <- 3
tau <- 4
lambda_L <- 0.001
# Step 2: use base() function to compute zeta2 and delta
set.seed(1234) # Monte-Carlo integration in base()
bparam <- base(lambda_D, lambda_H, kappa, tau_b, tau, lambda_L)
# Step 3: compute sample size under HRs 0.8 and 0.9
obj <- WRSS(xi = log(c(0.9, 0.8)), bparam = bparam, q =  0.5, alpha = 0.05,
          power = 0.8)
obj$n
#> [1] 1240.958

A Range of Effect Sizes

Different HRs: $\exp(\xi)\in [0.6, 0.95]^{\otimes 2}$

Conclusion

Notes

Event-specific win ratio
- Win/loss on which component (Yang et al., 2022; Yang & Troendle, 2020)
More on sample size
- Calculatation based on win/loss proproportions (Yu & Ganju, 2022)
- Simplified/approximate approaches in various scenarios (Gasparyan et al., 2021; Seifu et al., 2022, etc.; Wang et al., 2023; Zhou et al., 2022)

Summary

Win ratio test
- Standard: death > one nonfatal event
- Recurrent events: death > frequency > time to last/first event
  - WR::WRrec(ID, time, status, trt, strata)
Sample size calculations
- Gumbel-Hougaard copula for death & nonfatal event \[\pr(D^\a>s, T_1^\a>t) = \exp\left(-\left[\{\exp(a\xi_D)\lambda_Ds\}^\kappa + \{\exp(a\xi_H)\lambda_Ht\}^\kappa \right]^{1/\kappa}\right) \]
  - Step 1: estimate $(\lambda_D,\lambda_H,\kappa)$ WR::gumbel.est(id, time, status)
  - Step 2: calculate $\zeta^2_0(\lambda_D,\lambda_H,\kappa,\tau_b,\tau,\lambda_L)$ and $\delta(\lambda_D,\lambda_H,\kappa,\tau_b,\tau,\lambda_L)$ WR::base()
  - Step 3: $n=\frac{\zeta_0^2(\lambda_D,\lambda_H,\kappa,\tau_c,\tau,\lambda_L)(z_{1-\alpha/2} + z_\gamma)^2}{q(1-q)\delta(\lambda_D,\lambda_H,\kappa,\tau_c,\tau,\lambda_L)^\T\xi}$ WR::WRSS()

References

Bebu, I., & Lachin, J. M. (2016). Large sample inference for a win ratio analysis of a composite outcome based on prioritized components. Biostatistics, 17(1), 178–187. https://doi.org/10.1093/biostatistics/kxv032

Buyse, M. (2010). Generalized pairwise comparisons of prioritized outcomes in the two-sample problem. Statistics in Medicine, 29(30), 3245–3257. https://doi.org/10.1002/sim.3923

Dong, G., Hoaglin, D. C., Huang, B., Cui, Y., Wang, D., Cheng, Y., & Gamalo-Siebers, M. (2023). The stratified win statistics (win ratio, win odds, and net benefit). Pharmaceutical Statistics, 22(4), 748–756. https://doi.org/10.1002/pst.2293

Dong, G., Hoaglin, D. C., Qiu, J., Matsouaka, R. A., Chang, Y.-W., Wang, J., & Vandemeulebroecke, M. (2020). The Win Ratio: On Interpretation and Handling of Ties. Statistics in Biopharmaceutical Research, 12(1), 99–106. https://doi.org/10.1080/19466315.2019.1575279

Dong, G., Li, D., Ballerstedt, S., & Vandemeulebroecke, M. (2016). A generalized analytic solution to the win ratio to analyze a composite endpoint considering the clinical importance order among components. Pharmaceutical Statistics, 15(5), 430–437. https://doi.org/10.1002/pst.1763

Dong, G., Qiu, J., Wang, D., & Vandemeulebroecke, M. (2017). The stratified win ratio. Journal of Biopharmaceutical Statistics, 28(4), 778–796. https://doi.org/10.1080/10543406.2017.1397007

Finkelstein, D. M., & Schoenfeld, D. A. (1999). Combining mortality and longitudinal measures in clinical trials. Statistics in Medicine, 18(11), 1341–1354. https://doi.org/10.1002/(sici)1097-0258(19990615)18:11<1341::aid-sim129>3.0.co;2-7

Gasparyan, S. B., Folkvaljon, F., Bengtsson, O., Buenconsejo, J., & Koch, G. G. (2020). Adjusted win ratio with stratification: Calculation methods and interpretation. Statistical Methods in Medical Research, 30(2), 580–611. https://doi.org/10.1177/0962280220942558

Gasparyan, S. B., Kowalewski, E. K., Folkvaljon, F., Bengtsson, O., Buenconsejo, J., Adler, J., & Koch, G. G. (2021). Power and sample size calculation for the win odds test: application to an ordinal endpoint in COVID-19 trials. Journal of Biopharmaceutical Statistics, 31(6), 765–787. https://doi.org/10.1080/10543406.2021.1968893

Gehan, E. A. (1965). A generalized Wilcoxon test for comparing arbitrarily singly-censored samples. Biometrika, 52(1-2), 203–224. https://doi.org/10.1093/biomet/52.1-2.203

Luo, X., Qiu, J., Bai, S., & Tian, H. (2017). Weighted win loss approach for analyzing prioritized outcomes. Statistics in Medicine, 36(15), 2452–2465. https://doi.org/10.1002/sim.7284

Luo, X., Tian, H., Mohanty, S., & Tsai, W. Y. (2015). An Alternative Approach to Confidence Interval Estimation for the Win Ratio Statistic. Biometrics, 71(1), 139–145. https://doi.org/10.1111/biom.12225

Mao, L. (2019). On the Alternative Hypotheses for the Win Ratio. Biometrics, 75(1), 347–351. https://doi.org/10.1111/biom.12954

Mao, L., Kim, K., & Li, Y. (2022). On recurrent-event win ratio. Statistical Methods in Medical Research, 31(6), 1120–1134. https://doi.org/10.1177/09622802221084134

Oakes, D. (1989). Bivariate Survival Models Induced by Frailties. Journal of the American Statistical Association, 84(406), 487–493. https://doi.org/10.1080/01621459.1989.10478795

Oakes, D. (2016). On the win-ratio statistic in clinical trials with multiple types of event. Biometrika, 103(3), 742–745. https://doi.org/10.1093/biomet/asw026

Pocock, S. J., Ariti, C. A., Collier, T. J., & Wang, D. (2012). The win ratio: a new approach to the analysis of composite endpoints in clinical trials based on clinical priorities. European Heart Journal, 33(2), 176–182. https://doi.org/10.1093/eurheartj/ehr352

Seifu, Y., Mt-Isa, S., Duke, K., Gamalo-Siebers, M., Wang, W., Dong, G., & Kolassa, J. (2022). Design of paediatric trials with benefit-risk endpoints using a composite score of adverse events of interest (AEI) and win-statistics. Journal of Biopharmaceutical Statistics, 33(6), 696–707. https://doi.org/10.1080/10543406.2022.2153202

Wang, B., Zhou, D., Zhang, J., Kim, Y., Chen, L.-W., Dunnmon, P., Bai, S., Liu, Q., & Ishida, E. (2023). Statistical power considerations in the use of win ratio in cardiovascular outcome trials. Contemporary Clinical Trials, 124, 107040. https://doi.org/10.1016/j.cct.2022.107040

Yang, S., & Troendle, J. (2020). Event-specific win ratios and testing with terminal and non-terminal events. Clinical Trials, 18(2), 180–187. https://doi.org/10.1177/1740774520972408

Yang, S., Troendle, J., Pak, D., & Leifer, E. (2022). Event-specific win ratios for inference with terminal and non-terminal events. Statistics in Medicine, 41(7), 1225–1241. https://doi.org/10.1002/sim.9266

Yu, R. X., & Ganju, J. (2022). Sample size formula for a win ratio endpoint. Statistics in Medicine, 41(6), 950–963. https://doi.org/10.1002/sim.9297

Zhou, T. J., LaValley, M. P., Nelson, K. P., Cabral, H. J., & Massaro, J. M. (2022). Calculating power for the Finkelstein and Schoenfeld test statistic for a composite endpoint with two components. Statistics in Medicine, 41(17), 3321–3335. https://doi.org/10.1002/sim.9419

Statistical Methods for Composite Endpoints: Win Ratio and Beyond

Outline

Win Ratio Basics & Properties

Standard Two-Sample

Pocock’s Rule

Calculation of Win Ratio

The Binary Case

Hypothesis Testing

Alternative Hypothesis

Variations

Handling Recurrent Events

General Data

General Rule of Comparison

Generalized Win Ratio

Examples

Options for Recurrent Events

Comparison with Pocock’s

Alternative Hypothesis for LWR

Software: WR::WRrec()

HF-ACTION: Data

HF-ACTION: Summary

HF-ACTION: WR Analyses

HF-ACTION: Overall

Sample Size Calculations

Special Case: PWR

Sample Size Formula

Parameter Specification

Software: WR::WRSS()

A New Training Trial

HF-ACTION: Historical Data

HF-ACTION: Baseline Outcome

Sample Size: Example

A Range of Effect Sizes

Conclusion

Notes

Summary

References

Software: `WR::WRrec()`

Software: `WR::WRSS()`