WWR: An R package for analyzing prioritized outcomes

Background : In analyzing survival trials, time-to-first-event analysis is commonly used without considering clinical importance of different events. The recently proposed win loss approach can take into account the order of clinical importance of clinical events. The weighted win loss approach improves the efficiency of the original win loss approach with optimal weights. The general win loss approach can be used to handle all types of outcomes. A publically available tool to implement these approaches is on demand. Methods : We developed an R package, WWR, to implement the win loss approaches in analyzing prioritized outcomes. Results : We illustrate the use of this package with a real data example. Conclusions : The WWR package provides an efficient way to calculate the win loss statistics (ratio, difference and product) and their asymptotic variances along with the contribution indexes.


Introduction
There are often multiple endpoints in clinical trials. Other than using a multiple comparison procedure to assess these endpoints in order to control the family-wise error rate, a popular approach is to analyze some composites of the endpoints. Composite endpoints, if suitably constructed and carefully interpreted, can increase effect sizes and shorten study timeline to give sponsors better probability of success under time and resource constraints.
The traditional way to construct composite endpoints ignores the clinical importance of the endpoints. For example, in many survival trials, time to the first occurrence of relevant clinical events is the composite endpoint. This approach treats all endpoints equally and does not distinguish clinical priorities of the events. In cardiovascular (CV) trials, for example, CV death may bear a higher priority than other non-fatal events, the usual MACE (Major Adverse Cardiac Events) composite endpoint cannot differentiate the event priorities.
Considering priorities of different endpoints, Buyse [1] and Pocock et al. [2] proposed the win loss approach. This approach uses a layered comparison procedure that aligns the order of outcome comparisons with the order of outcome importance, which results in test statistics that can take into consideration of the pre-defined outcome hierarchy. This approach can be used on different types of outcomes including continuous, discrete and survival, even though the term "win ratio" was first used by Pocock et al. [2] when analyzing survival outcomes. The idea is to first form a pair with subjects each from the treatment group and the control group. Then a layered comparison procedure is implemented such that the most important outcome is compared to determine the winner and the loser; if a tie is resulted, the second important outcome is used to break the tie; if still tied, the third important outcome kicks in; and so forth, until the ultimate tie is resulted. Because this comparison procedure aligns the order of comparisons with the order of outcome priorities, it can take into consideration of the predefined outcome hierarchy. With this appealing feature, the win loss approach has become an increasingly popular approach in analyzing outcomes of different priorities.
There have been very interesting developments on this approach since its inception. Luo et al. [3] established a statistical framework for this approach under the semi-competing risks doi: 10.7243/2053-7662-5-4 data setting. Later on, Luo et al. [4] proposed a weighted win loss approach to improve its interpretability and efficiency under the same setting. Contribution index can provide an easy interpretation of the result as the contribution of each event to the win/loss. In addition, weights specified can be differentiated based on the clinical importance of each event. Bebu and Lachin [5] used a partial order to characterize the win loss relationship thus the win loss approach can be generalized to other types of outcomes. Wang and Pocock [6] specifically discussed its use for continuous outcomes. The win loss approach can be coupled with matched [2] and stratified [7] analyses to achieve better homogeneity in the pairwise comparison step. To minimize the impact of censoring, Oakes [8] proposed to use a common time horizon for the calculation of win ratio. Interested readers are encouraged to go through the aforementioned papers for a better understanding of the win loss approach.
Several packages for survival analysis have been developed in the R software environment for statistical computing and graphics [10]. Among these packages, survival (Therneau and Lumley 2016) is the most popular one to handle time-tofirst-event analysis.
In this paper, we present an R package [9], WWR [11], that implements the win loss approach in analyzing prioritized outcomes. This package provides the calculation of the win loss statistics including win ratio, win difference and win product along with their asymptotic variances. Note that, the win difference (i.e. the difference of total number of wins and the total number of losses) is the un-scaled version of the "proportion difference in favor of treatment" defined by Buyse [1]. The win product, as its name suggests, is the product of the ratios of win over loss for each outcome. The paper is organized as follows. We begin with a detailed description of the weighted win loss approach. We then describe the structure of WWR [11] and illustrate the use of this package with a real data example.

Definition of the win loss statistics
Definition under the semi-competing risks data setting Let T 1 and T 2 be two random variables denoting the times to the non-terminal and terminal events respectively. In the following, we assume that the joint distribution of T 1 and T 2 is continuous unless otherwise stated. These two variables are usually correlated with each other. T 2 can right censor T 1 but not vice versa. Data consisting of a non-terminal event and a terminal event are called semi-competing risks data [12]. Analysis of semi-competing risks data is complicated because terminal events can censor non-terminal events, potentially resulting in informative censoring. The traditional time-tofirst-event analysis only considers the first event, however non-terminal events always occur before terminal events if both are observed. Consequently, in the traditional analysis, the non-terminal events are given higher priority than the far more serious event such as death. The win ratio approach as proposed by Pocock et al. [2] is to consider the terminal event of higher priority. Let Z = 1 denote the treatment group and Z = 0 the control group. In addition, C is the time to censoring, which is assumed to be independent of (T 1 , T 2 ) given Z. Let the observed times Y 1 = min(T 1 , T 2 , C) and Y 2 = min(T 2 ,C) and the event indicators δ 1 = I(Y 1 = T 1 ) and δ 2 In some experimental design setting, one may consider the observed data {(Y 1i , Y 2i , δ 1i , δ 2i ) : i =1......,n} are iid samples given the group assignment (Z 1 .....,Z n ). The difference between these two considerations is that the former treats the group assignments (Z 1 ......,Z n ) as random variables and the latter treats them as fixed.
For i, j = 1.....,n, the win (i over j) indicators for the terminal and non-terminal events are respectively W 2ij = δ 2j I(Y 2i ≥Y 2j ) and W 1ij = δ 1j I(Y 1i ≥Y 1j ), correspondingly, the loss (i against j) indicators are L 2ij = δ 2i I(Y 2j ≥Y 2i ) and L 1ij = δ 1i I(Y 1j ≥Y 1i ), and the tie indicators for the terminal event Ω 2ij = (1 -W 2ij ) (1-L 2ij ) and for the non-terminal event Ω 1ij = (1 -W 1ij ) (1 -L 1ij ). Because the distribution of T 1 and T 2 is continuous, any of the above comparisons can only fall into one category (win, loss or tie). If the distribution has jump point(s), there could be some cases that W kij = L kij = 1, k = 1, 2, i ≠ j. These cases happen when δ ki δ kj I(Y ki = Y kj ) = 1, i.e. the two observed event times are identical. In our R functions for calculating win loss statistics under semi-competing risks data setting, to be fair, these cases will be counted as both wins and losses. This will affect the total numbers of wins and losses but we believe the impact is minimal as it is uncommon to have observed event times being tied with each other (the censoring times often have ties). Moreover, according to the definitions below, counting the "tied" event times as both wins and losses does not change the value of win difference, as these wins and losses will be canceled out by subtraction.
Suppose we have two possibly data dependent weight functions G 2 (.) and G 1 (., .). The weight function G 2 is for the terminal event and could possibly depend on the terminal event time. G 2 =1 is a Gehan weight; if G 2 is the at-risk probability of the terminal event, it is a log-rank weight. The weight function G 1 is for the non-terminal event. Since the non-terminal event times are compared after the terminal event information is not suffcient to break a tie, G 1 may depend on both the terminal event time and the non-terminal event time. As described in Luo et al. (2017) [4], if G 1 is a function of the terminal event time or non-terminal event time, it yields a log-rank test either for the terminal or non-terminal event corresponding to a terminal log-rank weight or a non-termial log-rank weight. If G 1 is a function of both the terminal event time and non-terminal event time, it provides a test which is similar to the log-rank test for the terminal event corresponding to a mixed log-rank weight. The choices of the weight functions G 1 and G 2 are discussed in Luo et al. (2017) [4]. We can define the weighted total number of wins and losses based on the terminal event doi: 10.7243/2053-7662-5-4 and the weighted total number of wins and losses based on the non-terminal event here and in the sequel, for any real numbers a and b, a^ b = min(a, b). Clearly, W 2 (G 2 ) and L 2 (G 2 ) are the weighted total number of wins and losses of the treatment group over the control group for the terminal event, and W 1 (G 1 ) and L 1 (G 1 ) are the weighted total number of wins and losses of the treatment group over the control group for the non-terminal event among pairings of the observations that are "tied" for the terminal event.
The weighted win ratio, win difference and win product are respectively Note that the (un-weighted) win loss statistics are the special cases with G 1 =1 and G 2 =1. As shown in Luo et al. (2015Luo et al. ( , 2017 [3,4], the win loss statistics can be used to test the hypothesis Here, for k = 0, 1, λ 2k (·) is the hazard function of T 2 in group Z = k; λ 1k (y 1 | y 2 ) = pr(T 1 =y 1 |T 1 ≥y 1 , T 2 ≥y 2 , Z = k) is the conditional hazard of T 1 given the terminal event occurring sometime later. In fact, W k (G k )-L k (G k ) and W k (G k )/L k (G k ) are valid tests for H 0 k , k = 0,1. Here and in the sequel, we call H 0 the global null hypothesis.
The functions "winratio" and "wwratio" in the WWR package calculate the above un-weighted and weighted win loss statistics and their corresponding asymptotic variances [3,4]. Note that the weighted win loss function "wwratio" only provides variances under the global null hypothesis as the calculation of the variances under alternative hypotheses is prohibitively time-consuming. The un-weighted win loss function, "winratio", on the other hand, provides variances under both null and alternative hypotheses. Both of these functions also provide the contribution index which quantifies the proportion of each outcome contributing to the overall win and loss determination. For example, the contribution index for wins on the terminal event is defined as and the contribution index for losses on the non-terminal event is

Definition under general setting
The definition of win and loss can be applied to other outcomes and can be very flexible. Suppose that we observed outcome Y i for patient i in the treatment group and outcome X j for patient j in the control group, i = 1,.....,n 1 and j = 1,.....,n 0 . Note that Y i and X j can be vectors. Suppose that there are K outcomes of interest and they are aligned according to the clinical priorities (from high to low) as 1 to K. Suppose that for the k-th outcome (k = 1,.....,K), we have a comparison function C k (Y i , X j ) defined as Then the total number of wins for the first outcome is and the total number of wins for the k-th outcome k > 1 is Similarly, one may define the total number of losses TL k = 1,......,K. The general win ratio, win difference and win product are respectively When the three categories (win, loss and tie) are mutually exclusive for each outcome (which should be the case in most applications), a more concise way of data storage for the win loss indicators is to use one single n 1 x n 0 matrix A with its element A ij defined as U   2  2  2  2  2  2  1  1  1  1  2  2  2  2  2   One may easily verify that and then find that and k = 1,....,K. Therefore this concise way of storing the win loss indicators leads to a simpler definition of the total number of wins and losses. Even though the definition of the win loss statistics under the general setting is very broad, the interpretation of the statistics needs to be considered carefully on a case-by-case base. Often times, we will need to base on the definition of the customized wins and losses to find out the null hypothesis that the resulting statistics test is against. The function "genwr" in the WWR package calculates the above general win loss statistics and their corresponding asymptotic variances, provided the win loss matrix A is given. Note that, once the win loss matrix A is given, the rest of the inference and calculation of the win loss statistics does not depend on the original outcome data (Y 1 , ....., Y n1 ) and (X 1 ,...., X n 0 ). This separation in inference procedure allows maximum flexibility for users to define their own comparison functions. The contribution index for wins on the outcome k , k = 1,....K is and the contribution indexes for losses are similarly defined.

Package overview and usage
WWR provides tools for users to analyze prioritized outcomes. Currently, in Version 1.2.2 [11], there are three main functions ("winratio", "wwratio" and "genwr") in the package. The function "winratio", based on Luo et al. [3], provides calculation of the win loss statistics and their corresponding asymptotic variances under the semi-competing risks data setting. The function "wwratio", based on Luo et al. [4], provides calculation of the weighted win loss statistics and their corresponding asymptotic variances under the same setting. And the function "genwr", based on the idea of Bebu and Lachin [5], provides calculation of the (un-weighted) win loss statistics and their corresponding asymptotic variances for general types of outcomes. Note that, instead of using n 1 -1 and n 0 -1 in the calculation of estimated variances as Bebu and Lachin [5], we use n 1 and n 0 in the calculation which is consistent with the variance estimation in "winratio" and "wwratio". We expect the difference is small between these two versions of variance calculation when the sample size is moderate to large. In the following, we will illustrate the use of these three functions via a real data example.
We use a data set on transplant treatment of patients suffering from a blood cancer to illustrate the WWR package. This data set is available in an R package mstate as ebmt4. This data set includes one terminal event, death, and four intermediate events: Recovery (Rec), and Adverse Event (AE), a combination of the two (AE and Rec) and Relapse (rel). Baseline covariates including donor-recipient match, prophylaxis, year of transplant and age at transplant in years are also provided. We consider the terminal event of death as most important. The non-terminal event of Relapse is ranked next. The clinical question is whether the transplant treatment is different between patients who are prophylaxis positive and patients who are prophylaxis negative. The ebmt4 data are in wide format. Each row of the data corresponds to a unique patient. First, we use the following R commands to get the data.
The data structure looks like the following.
Then we assign the values to the arguments in "winratio" and "wwratio": We first calculated the un-weighted win loss statistics using the "winratio" function.
In this function, y1 is the observed time for the non-terminal , wins on outcome after ties with on outcomes 1 to 1; 0, and tie on all outcomes. event, y2 is the observed time for the terminal event, d1 is the non-terminal event indicator, d2 is the terminal event indicator, z is the treatment indicator that only takes values 0 and 1. We then calculated the weighted win loss statistics using the "wwratio" function which has the same arguments as the (unweighted) function "winratio" with additional weight indicators wty1 and wty2. For the terminal event, the choices of weight could be a Gehan type (wty2=1) or a log-rank type (wty2=2). For the non-terminal event, the choices of weight could be a Gehan type (wty1=1) or a mixed log-rank type (wty1=2) or a terminal log-rank type (wty1=3) or a non-termianl log-rank type (wty1=4). Note that user-defined weight functions are currently not supported by "wwratio". In this example, we used the log-rank weight for the terminal event (wty2=2) and the Gehan weight for the non-terminal event (wty1=1).
We then outputted the summary results from the un-weighted and weighted win loss statistics.
The summary output includes a summary of study design in terms of sample size per treatment group; total number of win and loss in group Z=1 and total number of win and loss in group Z=1 by importance from most to least. Win/loss contribution indexes are provided to the outcomes ordered by importance from most to least.
The un-weighted win difference and win ratio statistics show that there is a statistically significant difference between the prophylaxis positive patients (group 1) and the prophylaxis negative patients (group 0) in terms of survival time and relapse.
Patients without prophylaxis tend to survive longer or have a longer period without relapse. The test results are driven by the death events based on the corresponding win contribution indexes of 40.93% for group 1 and 52.80% for group 0; while the relapse events have less impact because the corresponding win contribution indexes of 2.71% for group 1 and 3.56% for group 0. Note that the win ratio and win product have the n -1/2 convergence rate, so the z value is defined as, where n is the total sample size from both groups and sd is the standard deviation of the test statistics. Since we are using the difference of total wins and total losses, the win difference has n 3/2 convergence rate, so the z value for win difference is defined as, The weighted win loss statistics provide similar results to those of the un-weighted ones, except that, using the weighted version, the win product also shows the prophylaxis positive patients (group 1) are significantly worse than the prophylaxis negative patients (group 0).
In the real world, a patient surviving one day longer than another patient may not be considered as a clinically meaningful win. Similarly, one day longer in relapse time may not be clinically meaningful. One may consider, for example, 90 days more in survival time and 30 days more relapse time as clinically meaningful wins. If this is the case, we can use the following function to incorporate any threshold specification for a pairwise comparison. With this function, we can define the win loss matrix and then use the function "genwr" to calculate the win loss statistics based on the above win definition. This function can provide win loss calculation once the user-defined win loss matrix is provided.
The following results show that the prophylaxis positive patients (group 1) are significantly worse than the prophylaxis negative patients (group 0) under the new win loss definition.
When the sample size is less than 2,000, calculation of the win loss indicator matrix in R can be done with a reasonable speed. When the sample size increases, it requires long program runtime. If this is intolerable, one may use a compiled code for this step. In the help file for the function "genwr", we provided examples on how to use Fortran code to perform the calculation of the win loss indicator matrix, and then use the inline package to port the code in R (so that R can understand and execute it). Other programming languages (C, C++, Python) may be used to accomplish this step. We conducted a simulation study to compare the run times between using the inline package and using the R function only in defining the win loss matrix. The following table shows that the run time using the inline is at least 330 times shorter on average. When n ≥2,000, one may need to consider using the compiled code to improve performance. The detail of the simulation can be found in the online supplementary materials.