Model choices to obtain adjusted risk difference estimates from a binomial regression model with convergence problems: An assessment of methods of adjusted risk difference estimation

Background : Risk Difference (RD) is becoming the measure of choice for estimating effect size in antimalarial drug efficacy trials. Calculating RD using binomial regression is prone to model non-convergence. Cheung’s modified ordinary least squares (OLS) method is a proven technique for handling non-convergence when estimating RD. Other promising methods include the Poison, Additive Binomial Regression and binary regression models fitted using the statistical package R. (Deddens’) Copy method that was primarily developed to overcome non-convergence of log-binomial regression models when estimating risk ratios is another potential method. Simulations were conducted to compare the performance of the Copy method against four alternatives (Cheung’s modified OLS method, the Additive Binomial Regression Model fitted with the blm algorithm, the binary regression model fitted with the glm2 algorithm, and the Poisson model with identity link and robust standard errors fitted with the glm algorithm) for obtaining RD estimates when a binomial model fails to converge. Methods : We computed estimates of efficiency and bias with treatment arm efficacies of (a) 60% vs. 85%, (b) 95% vs. 90%, (iii) 95% vs. 98% using simulation studies. A total of 5,000 datasets were simulated under each of these three scenarios. Results : The modified OLS method and the binary regression model fitted using the glm2 algorithm in R provided unbiased, efficient estimates of RD across all assessed scenarios. In contrast, the Copy method yielded biased estimates of RD even when 100% convergence was achieved. The Poisson and Additive Binomial Regression models had 100% and almost 100% convergence rates respectively, but both produced very slightly biased RD estimates. Conclusion : The Copy method is not suitable for estimating RD when binomial regression model fitting fails to converge. Cheung’s modified OLS or the binary regression model fitted using the glm2 algorithm in R should be the method of choice to overcome non-convergence with binomial models for calculating adjusted RD estimates.


Introduction
The most common measures for comparing binary outcomes between two groups include: a risk ratio (RR), an Odds Ratio (OR) and a risk difference (RD). The risk ratio (RR) is an appropriate summary statistic in cohort studies where study participants are selected on the basis of their exposure to the risk factor of interest. However, the risk difference provides an appealing, informative alternative effect estimate and increasingly is being used in randomized controlled trials.

Journal of Medical Statistics and Informatics
One pitfall of the RR is the possibility of incorrect interpretation. For example, the risk ratio for the outcome being a success is not the inverse of the risk ratio for the (same) outcome being a failure [1], although the two ratios are complementary and should ideally lead to the same conclusion. Of particular concern is that, when risk ratios are used in Randomized Controlled Equivalence Trials, evidence of equivalence in the failure rates of two treatments does not necessarily imply evidence of equivalence in the success rates of the same two treatments [2]. The same may be true for non-inferiority studies also. A risk difference, on the other hand, has the same value irrespective of whether success or failure is modelled, so is a preferable headline summary statistic if only on the grounds of ease, and consistency, of interpretation.
Furthermore, when the failure levels are very low in both treatment arms, modeling treatment failure may result in a large risk ratio that represents only a small failure risk difference. For example, consider a situation where the treatment failure rates for treatment A and B are 1% and 4% respectively. The treatment failure risk ratio (B to A) is 4; in simple terms, individuals receiving treatment B are 4 times more likely to experience a treatment failure than those receiving treatment A. Even if the sample size is sufficient for this RR to be statistically significantly different from the null hypothesis value of 1, it only reflects a RD of 3%. Of course, if "failure" is death, this may be a fair and sensible indication of the relative efficacy of the treatments -but in many contexts, for example if failure is incomplete resolution of the symptoms associated with the common cold within a pre-specified time period, the real clinical significance of this difference may be considered to be small or even negligible. In this latter scenario, a RD is probably a more informative measure of relative treatment efficacy.
Two final considerations that may be important when selecting the most appropriate summary statistic for indicating the relative sizes of treatment effects: the confidence intervals (CIs) for RRs are not symmetric about the estimate (except on the log scale), which may serve to 'exaggerate' true relative effect; while many researchers continue to favor reporting odds ratios (ORs) in prevalence studies on the basis that many previous studies of a similar nature to their own have reported this statistic, it is a simple matter of fact that the interpretation of an OR is difficult for many researchers [3].
In summary, risk difference (RD) is often the effect measure of choice, particularly for malaria treatment efficacy studies, the focus of this paper. The RD has the considerable advantage of being much easier to interpret than the most commonly used alternatives, RR and OR, and produces the same value irrespective of whether success or failure is measured so is flexible to use in practice. Unfortunately, however, the serious practical problems that can be experienced when calculating RD values can exceed these advantages and either RR or (most usually) OR values are calculated.
Technically, the most straightforward way of calculating risk differences is by using standard binomial regression methods, but these are prone to convergence failure -and these model non-convergence problems increase as one or both efficacy levels approach a boundary value (i.e. as these approach either 0% or 100%), for the mathematical reasons described below.

Estimation of probabilities in a risk difference model using binomial regression
Consider the following generalized linear model (GLM): g(u)=α+β 1 X 1 +β 2 X 2 + ……….. +β k X k (1) where g(u) is a link function identifying a function of the mean that is a linear function of the covariates; X 1, X 2, ………. , X k form a set of k explanatory variables such as age, sex, location, etc.
Using an identity link with a response on a continuous scale in the above GLM model would yield a multiple linear regression model.
When the outcome is binary, u (often alternatively represented as π) in this model is the probability of observing a specific category of the binary outcome. The GLM with binary outcome and identity link reduces to: g(u) = π=α+β 1 X 1 +β 2 X 2 + ……….. +β k X k (2) Because the expression α+β 1 X 1 +β 2 X 2 + ……….. +β k X k is unbounded, the estimate of π is a linear function of the explanatory variables and can thus yield estimates of outcome probabilities that are outside the valid range of 0 to 1.
Let us now reconsider equation (2). If X 1 is a binary exposure (0 or 1) denoting the treatment / intervention to which an individual is randomized in a clinical trial, the estimate of the adjusted risk difference (RD) becomes: Substituting from equation (1) gives: RD=[(a b 1 *1+b 2 X 2 + …… +b k X k )-(a+b 1 *0+b 2 X 2 + …….. +b k X k )] (4a) where a and b i are estimates of α and β i respectively. When, as occurs in most conventional clinical trials, there are only two treatments this reduces to: For a linear regression of y i on x i , b 1 is given by: For a binary outcome and binary exposure, equation (5a) reduces to which is exactly the same as the RD estimated by a binomial regression [2].
Cheung shows that using the Huber-White robust estimate of variance for the OLS RD gives the same variance as that provided by the maximum likelihood binomial regression. The Huber-White estimate of variance in matrix form is given as: Where X is the design matrix with x i as the i th row of the matrix; is the vector of OLS estimates.
The OLS estimate of the risk difference is just b 1 . In contrast to the estimation of probabilities, this has less stringent boundary constraints suggesting that, if the interest is in estimating the risk difference rather than the individual risks (probabilities) themselves, estimates of the risk difference based on the above linear model would be valid.
Cheung demonstrated both algebraically and using simulations that Ordinary Least Squares estimation methods with Huber White standard errors are valid for the estimation of risk differences. This method also avoids the non-convergence problems that can be experienced when using a binomial regression model with an identity link function [2]. Cheung's modified least squares method is a simple method with appealing properties that can be applied in many standard software packages.
Deddens and Petersen [4] proposed the Copy method to address the problem of non-convergence when estimating risk ratios with the log-binomial model using Maximum Likelihood Estimation (MLE). Non-convergence usually occurs when the effect estimates are near the boundary of the parameter space (when either or both of the risk estimates is close to 0% or 100%, the risk ratio itself is either close to zero or approaching infinity). As its name suggests, in this approach multiple copies of the dataset are added to the original set. For a risk ratio, the Copy method involves calculating MLEs using a log-binomial model on an expanded version of the data set that contains K-1 copies of the original dataset plus one copy of the original dataset in which the values of the binary outcome variable are reversed (the 1's (successes) are all changed to 0's (fails) and the 0's (fails) are all changed to 1's (successes)). For a log-binomial model, if the total number of dataset copies, K, is finite, the estimate is an MLE for the "copied" dataset [5]. When the binomial regression model is applied to this modified data set, the model converges and approximate maximum likelihood estimates of the risk ratio are obtained [4][5][6].
Petersen and Deddens [5,6] recommend that K should be at least 100 (in their paper they used a value of K=1,000). The standard error estimates for the MLEs obtained with the Copy method are based on K copies so have to be multiplied by √(K) to convert them to estimates for the original (single) dataset. Mathematically, expanding the original data set in the manner required for the copy method is simply equivalent to creating a new data set consisting of one copy of the original data set, having a weight of K-1, and one copy of the original data set with the outcome values reversed, having a weight of one. Lumley [7] showed that use of the weights (K-1)/K and 1/K for the original outcome and the reversed outcome datasets respectively eliminates the need to adjust the standard error [7].
A number of methods have been suggested for estimating (adjusted) risk differences. Early methods were based either on simple formulas derived from odds ratios (Greenland and Holland [8]) or on generalized linear models (Wacholder [9]). Later methods include a modified Poisson regression model fitted using the software package SAS [10]; this involves some very simple and straightforward coding but as SAS is expensive it is often not accessible by many researchers, and there are well documented situations in which this model fails to converge [2,11].
Another approach proposed for calculating model-adjusted risk difference estimates uses mean marginal predictions from a logistic regression model ( [11]). This model can be fitted with the SUDAAN software package which is capable of handling complex data; algorithms for fitting this model are also available for software packages such as SPSS and Stata but again the cost of these may be prohibitive for some researchers. Almost certainly, this model can be fitted using the free software package R;, however, although R is perfectly capable of handling complex statistical analyses, we note that Stata/SPSS users frequently fail to migrate to R even after several attempts due to the steep learning curve involved. In addition, Williamson, Eliasziw and Fick [12] experienced convergence problems with this and less standard models in all of the software packages they used, including both R and Stata. Even the powerful SAS software package sometimes failed to converge for the log-binomial model [12,13], and convergence failed with Stata due to the iterations going to a wrong place and then being unable to return to the parameter space.
It is acknowledged that several methods have been proposed to tackle the problem of non-convergence. As most of these focus on risk ratio estimation with only a few attempting to tackle this problem in the context of risk difference estimation, however, we agree with Williamson [12] that, despite the increasing volume of research being published attempting to deal with non-convergence in log-binomial models, this problem still exists, especially when estimating risk differences. There have been recent extensive developments in software packages such as SAS, Stata and SUDAAN attempting to resolve problems of non-convergence. The most attractive and promising of these appear to be in R, for example fitting Additive Binomial Regression Models with the blm algorithm, fitting the binary regression model using the glm2 package, and fitting the Poisson model with identity link and robust standard errors through the glm algorithm. The Binomial linear model (BLM) is defined as follows: Let Y be a binary random variable. Under the binomial linear model (BLM), the probability of an outcome is a linear function of a covariates x, is: where Y=1 if the outcome is observed and 0 otherwise [14]. Each of the estimated coefficients of β is the adjusted risk difference associated with a unit increase in the corresponding covariate [14].
Although Cheung's modified OLS and BLM appeared to be possible solutions to the problem of model non-convergence, we could find no documented evidence in the literature on whether or not the copy method can be applied to overcome non-convergence when fitting a binomial regression model using the identity link function to obtain risk differences. Thus, to address this, we used simulation methods to assess the performance of the copy method when applied to risk difference modeling in situations where the original binomial model fails to converge, comparing its performance to four alternatives: the established Cheung's modified OLS method, the Additive Binomial Regression Model output with the blm package, the binary regression model fitted via glm2 package in R, and the Poisson model with identity link and robust standard errors fitted via glm in R.

Methods
In order to simulate covariate data that reflected a real life situation, parameter values were derived from a malaria drug efficacy study conducted in Malawi between 2003 and 2006 [15]. The covariates of interest were hemoglobin (Hb), age and weight (wt) because these are likely to be associated with both treatment and outcome, consequently there may be need to adjust for them when estimating treatment effect. That is, they are potential confounders. Outcome data were simulated for three different scenarios using Bernoulli distributions: (i) with an efficacy level of 0.85 (85%) for group A and 0.60 (60%) for group B, a true absolute efficacy difference of 0.25 (25%); (ii) an efficacy level of 0.90 (90%) for group A and 0.95 (95%) for group B; and (iii) an efficacy level of 0.98 (98%) for group A and 0.95 (95%) for group B. The aim of considering these different efficacy levels was to allow for generalizability of the findings across a range of outcome scenarios.

Parameter values for simulation of covariates and the assessment criteria
The simulations were performed using the Stata 13 software. For each scenario 5,000 simulated datasets were used. The covariates in each model were hemoglobin (Hb), age and weight. Treatment group was included as a factor.
The matrices of parameters for simulating the baseline covariates were as follows: where: X is a vector of the three covariates (measured on a logarithmic scale for age and wt but using the original scale units for hemoglobin (Hb)); μ is a vector of the mean values for log(age), Hb; log(weight) respectively.
The outcome was simulated as a binary variable with the desired success rates in each group. Models involving treatment group as a factor and age, Hb and weight as covariates were fitted under the following strategies: 1. Binomial regression on the original dataset. 2. The copy method with K values ranging between 10 and 100,000. 3. Cheung's modified OLS method. 4. The binary regression model fitted via glm2 package in R. 5. The Poisson model with identity link and robust standard errors fitted via glm package in R software. 6. The Additive Binomial Regression Model fitted via the blm package in R software. The following statistics were obtained from each of these analyses: the percentage of simulated datasets for which the model fitting converged, the degree of bias in the MLEs of the true efficacy difference and the average (mean) 95% lower and upper limits of the confidence intervals statistical coverage.

Standard binomial regression model
In all three of the simulated scenarios, convergence problems were encountered when the binomial regression model was fitted ( Table 1). The problem worsened as efficacy rates approached a boundary value. Even when neither rate was near a boundary, approximately 5% of the models failed to converge. When both efficacy rates were at least 95%, however, model fitting failed to converge for over 60% of the simulated datasets. When convergence did occur, the estimates were biased for each of the scenarios investigated. Coverage was unacceptably low, being just under 90% for the scenario involving 60% and 85%efficacy rates and below 40% for the scenario where both efficacy levels were close to 100%.

Cheung's modified OLS method and binary regression model fitted via glm2 in R
The Cheung's modified OLS approach and the binary regression model fitted using the glm2 algorithm in R performed consistently well across the range of scenarios considered in terms of convergence rates and bias (Tables 2 and 3). There was 100% convergence and the mean bias was zero for all three assessed scenarios. The coverage was high being around 95% for all assessed scenarios for the OLS approach while the binary regression model fitted using glm2 was conservatively too high being 100% for all the scenarios being investigated.

Poisson model with identity link and robust standard errors fitted via glm in R
The Poisson model with identity link and robust standard errors fitted using the glm algorithm in R also achieved 100% convergence but experience some, arguably negligible, bias in the estimated RD values ( Table 4). The coverage was too high being 100% for all the scenarios that we considered.

Additive Binomial Regression Model fitted via blm in R
The Additive Binomial Regression Model fitted using the blm algorithm in R software experienced some non-convergence but in only a few simulated sets. The convergence rates were above 98%. Some (negligible) bias was observed (     scenarios. The coverage was 91.8% for the 98% vs 95% efficacy rates; 98.5% for the 95% vs 90% efficacy scenarios and 100% for the 85% vs 60% efficacy scenarios.

Simulation results: Deddens' Copy method
For each simulated scenario, the percentage of non-convergent models was least when the number of copies (K) used was 10.
When 20 or more copies were used, the percentage of nonconvergent models increased (Figures 1-3) and continued to do so consistently as K was increased. However, for all three scenarios, the non-convergence rates were better than the rate obtained using no copies for up to at least 100 copies, although beyond 5,000 copies the non-convergence rates were worse for all the scenarios. In addition, some datasets that converged when analyzed conventionally failed to converge when the copy method was adopted.
Bias was observed irrespective of the number of copies used. With just 10 copies, risk difference was underestimated in all three scenarios. Tables of the parameter estimates are presented in detail in Supplementary Tables S1-S3.

Discussion
The Copy method did not provide a remedy to the non-conver-gence problem when modeling risk differences. These findings agree with those reported by Petersen and Deddens [5] and also by Chen, Shi and Qian [16] in a discussion of their finding that non-convergence occasionally occurs for log-binomial models even if the COPY method is applied. So, while the Copy method has been found to be effective in solving nonconvergence problems when using log-binomial models to estimate risk ratios [4][5][6], our findings demonstrate clearly that the method is not appropriate for modeling risk differences.
In all of the simulation scenarios we considered, 100% convergence was often achieved when around 10 copies were used but this was found to be the number of copies at which the estimates for the risk difference were most biased. Increasing the number of copies beyond 10 simply increased the likelihood of non-convergence. The most likely explanation for this is that, although close to 100% convergence can be achieved with K=10, the effect of the reversed copy is not diluted sufficiently to provide unbiased estimates of risk differences.
Perhaps surprisingly at first glance, convergence failure rates became more common when the number of copies was increased beyond 5000, but this is due to the fact that increasing K yields an outcome distribution similar to the original in which the binomial model failed, thus resulting  in the model failing again. Hence the finding that datasets that did not converge with the original binomial model also tended to be more likely to fail to converge with the Copy method, particularly when large numbers of copies were made. Some of the datasets that were convergent with the original binomial regression model become non-convergent with the Copy method. A possible reason for this is that the Copy method creates very large datasets. In general, very large and very small datasets are both susceptible to non-convergence problems [13]. In addition, bias patterns were found to be very irregular with increasing number of copies, probably because fitted models that converge do not do so at random; those models that fail to converge when just a few copies are used will also fail to converge when more copies are added. This was not explored in any more depth in this paper as our primary focus was merely on whether the non-convergence problem would be solved or not.
The Cheung's modified OLS method and the binary regression model fitted using the glm2 algorithm in R appear to be the optimum approaches for estimating adjusted risk   differences. Our findings confirm that both of these methods not only converge but also provide unbiased estimates of risk differences when standard methods of binomial regression fail to converge. Both methods remains robust even when several covariates are included in the regression model, making them useful for controlling for potential confounders and also for identifying independent predictors of outcome when modeling risk differences. In addition, the methods yield unbiased estimates of risk differences with robust standard errors, thus offering clear statistical advantages over the use of the binomial regression method when the binomial model fails to converge. Efficient standard errors for the modified OLS are obtained via Huber-White standard errors. However the glm2 binary regression model suffers from too high (conservative) statistical coverage that remained at 100% for all scenarios. This coverage property puts Cheung's OLS at a statistical property advantage over the glm2 model. The Poisson model with identity link and robust standard errors fitted using the glm algorithm in R also emerged in our simulations as a promising method for estimating risk differences. Model convergence is assured using this approach; while some bias can be expected with this method, it is likely to be very minimal. Minimal bias risk also occurred when the Additive Binomial Regression Model was fitted with the blm algorithm in R, but disappointingly a small chance of model non-convergence was also identified.

Conclusion
The standard binomial model with an identity link function is the method of first choice for estimating risk/efficacy differences, provided the model converges. When a model fails to converge, this problem can be addressed most efficiently using either Cheung's modified OLS approach or by fitting a binary regression model with the glm2 package in R; both of these methods converge and provide efficient unbiased estimates of adjusted risk differences. The Cheung's OLS method offers a statistical coverage property over the glm2 approach. The Poisson model with identity link and robust standard errors fitted using the glm algorithm in R and the Additive Binomial Regression Model fitted using the blm algorithm in R are possible alternatives as they have 100% and almost 100% convergence rates respectively, but both produce very slightly biased risk difference estimates. In addition the Poisson model has too high coverage while the blm approach produces variable statistical coverage depending on efficacy scenarios.
The Copy method is not a suitable approach for estimating adjusted risk differences and is now effectively rendered obsolete in the presence of methods such as Cheung's modified OLS and the binary regression model fitted using the glm2 package in R.