On comparative performance of multiple imputation methods for moderate to large proportions of missing data in clinical trials: a simulation study

Background: Longitudinal clinical trial has measurements at successive occasions, and unavailability of patient at a scheduled visit causes missingness in expected full sequence of the measurements. Missing data is a major concern during the conduct of a clinical trial. It has been noted that missing data are not handled properly during final analysis which may considerably bias the results of analysis, reduce the power of the study and lead to invalid conclusions. A promising approach to handle this problem is to impute the missing values. Methods: Multiple imputation (MI) methods provide a useful strategy to deal with the data sets with missing values, where missing values are filled in by estimate and the resulting data sets are analyzed by complete data methods. Statistical methods to address missingness have been actively pursued in recent years. This paper has attempted to provide a description of missing data mechanism and various imputation techniques for missing data analysis in longitudinal clinical trials. Further, the appropriateness of multiple imputation methods has been discussed under moderate to large proportion of missingness in a simulated clinical trial data, by comparing the various performance measures derived through intensive simulation procedure. Results: For moderate proportion (~20 & 30%) of missingness MI-regression method scored minimum bias and MSE with increase in the sample size. However, other methods did not improve much despite increased sample size. For large proportion (50%) of missing data, MI-regression and MI-propensity score methods were close in performance but MI-regression method performed significantly well with increased number of subjects in the dataset. Conclusions: Present investigation showed that MI-regression method is most appropriate for the analysis of data in presence of missingness with discussed sample size and missingness mechanism. Overall, the study findings will help researchers having limited knowledge of statistical methodology to choose a multiple imputation method accordingly, so that achieved estimates will be more precised.


Introduction
Missing data is a major concern during conduct of a clinical trial. Dropout from the study before protocol specified completion time has been one of the major reason of the missingness. Withdrawal reason may be related to adverse event, lack of drug effectiveness, unpleasant study procedures etc. There could be other reasons which are not associated with the study, e.g., moving away from study place, external engagements of patients. Missingness is more prevalent in clinical trials where a slow acting or intolerable treatment drug is being investigated [1]. Unfortunately missing data are not handled properly during final analysis which considerably bias the results of the analysis, reduce the power of the study and lead to invalid conclusions. Even the valid statistical analysis of a longitudinal study may not be of help if missingness is related to the outcome measure [2]. One obvious strategy to address doi: 10.7243/2053-7662-2-9 missing data during analyses is to include only completers, i.e., the subjects whose complete data are available on variable of analysis interest. However, interpretation from such research may be biased. An attractive approach to handle this problem is to impute the missing values. Imputation methods provide a useful strategy to deal with data sets with missing values, where missing values are filled in by imputed values and the resulting each completed data set is analyzed by standard methods. Numerous methods for handling missing data have been proposed, examined and implemented [3]. In fact, there are so many methods that choosing a suitable method can be difficult. In the presence of missingness, there is no universally accepted best strategy to analyze the data. Hence, it is quintessential to study missingness mechanism first, thereafter, the strategy for analysis should be decided. A description of mechanism for missingness is given in the next section. Simplest imputation technique involves replacing of value with observed data e.g., baseline value, average of variable and last value carried forward [4]. Multiple imputation is considered one of the most useful technique to deal with missing data and it is readily available in modern statistical softwares [5,6]. Bias in the results may vary based on the proportion of missingness in the dataset, hence, the results of imputation methods are of interests. This article reviews theory of missing data mechanisms and various multiple imputation methods applicable to missing data analysis in longitudinal clinical trial data and discusses the results based on findings of multiple imputation procedures applied to a simulated diabetic clinical trial data.

Missing data mechanism
Longitudinal clinical trial has measurements at successive occasions and unavailability of patient causes missingness in expected full sequence of measurements. The missing data may affect statistical estimates profoundly during analysis. Hence, it is important to assess the impact of missing data on subsequent statistical inference [7]. The probable impact of missing data on statistical analysis is understood by underlying mechanism that causes missing data. Nomenclature for missing data mechanism is defined as follows by Rubin [8]. Data are considered missing completely at random (MCAR), if the probability that an observation is missing does not depend on either the observed or unobserved value in the dataset. Data are missing at random (MAR), if the probability that an observation is missing depends on observed values in dataset but not on unobserved values. Data are missing not at random (MNAR), if the missingness probability depends on unobserved or missing values in the dataset.
The following examples illustrate the mechanism of missing data that may arise in certain situation. For example, if a patient shifted to another city which is located at large from clinical investigation site, so the patient dropouts from the study. Since dropout is no way related to the outcome of interest, the missingness will classify under MCAR mechanism.
In another situation, if a patient's condition is not improving on investigative medicine and patient decides to withdraw from the study; we note that this dropout is related to the outcome and related to the observed data till now, so the missingness mechanism in this case is MAR [2]. If a patient lost to follow up because after the last visit to the hospital the patient's condition got worsened at home and he sought medical treatment somewhere else. As this dropout is related to the outcome of interest but previously observed data do not explain this dropout and missingness is dependent on unobserved data, hence, the missingness is due to MNAR.
Most of the times data are not MCAR, as missingness is related to the outcome of interest. In well designed studies, such as clinical trials, the MAR assumption is much more prevalent than the MCAR assumption, because the observed data explain much of the missingness in most situations [9,10].

Multiple imputation
Imputation refers to fill in missing values of a data set with their plausible values. These plausible values are sampled from the predictive distribution of missing data. The selection procedure of plausible values is what differentiates the various imputation methods. The objective of any imputation method is to produce a complete dataset which can then be analyzed using standard statistical methods. Imputation methods have been used frequently in the last few years to deal with missing data in the clinical trials. Imputation methods that have been put under single and multiple imputation methods. Single imputation refers to imputing one plausible value for each missing value for a particular variable in a data set and then performing analysis as if all data were originally observed. The idea of multiple imputation was proposed by Rubin [11]. In multiple imputation (MI), instead of replacing a single value for each missing observation, we substitute it with multiple (m>1) plausible values to reflect the uncertainty about the right value to impute. Rubin (1987) showed that efficiency of an estimate based on imputations can be approximated by following expression: where γ is the fraction of missing information for variable being estimated.
It is important to decide how many imputed data sets should be created. Rubin also stated that many may be surprised by the claim that only 3-10 imputations may be needed [11]. Unless missing information is unusually high there is no practical benefit of using more than five imputations [12]. The multiple imputations is motivated by Bayesian framework. To illustrate, let Q denote the scalar quantity to be estimated such as a mean, regression coefficient etc. Let Now statistical analysis is carried out on each imputed data sets separately, using standard procedures for complete data. The analysis enables us to calculate each of the imputed-data estimates ( ) Once the analysis is completed for each imputed data set, next step is to combine these results using imputed-data estimates. As per Rubin [11], the point estimate for Q is simply the average: The total variance of ˆM I Q consists of two parts: the "betweenimputation" variance and the "within-imputation" variance. The between-imputation and within-imputation variance are given as per below, respectively.
is an estimated variance on imputed data set t, t =1, 2,…..m. The estimated total variance is as follows: From the above, it is evident that multiple imputation techniques are advantageous over other methods, since statistical estimates after analysis of imputed datasets reflect the uncertainty or variability due to imputation [13]. Further, we have explored the performance of multiple imputation methods under varying sample size and proportion of missingness. The important multiple imputation methods are considered that can be implemented through MI procedure in SAS software: regression method, markov chain monte carlo(MCMC) and propensity score method. We also considered widely used single imputation method called last observation carried forward (LOCF), where last observed value is carried forward for missing values [14].

Data simulation and procedure implementation
To assess the performance of various imputation methods we have considered a simulated diabetic clinical trial data under MAR dropout mechanism. Since, observed data in clinical trials explain much of information, the MAR assumption is relevant in many scenarios [2,9,15]. Simulated variable is Glycosylated Hemoglobin (HbA1c), which is an indicator of blood glucose level used in diabetes management. The data sets are simulated to resemble the time behavior of HbA1c in the diabetes clinical trial. Simulation was done using statistical analysis system SAS 9.1.3. If intervention is a slow acting treatment and patient dropouts with a perception of non-effectiveness of drug, then observed data have information for missingness, the mechanism can be assumed as missing at random. We assume that under MAR mechanism, missingness can appear at any visit to any subject but with different probability. At a visit, the probability of missingness is 3 times in the subjects where the measurement of HbA1c falls down the mean value of the visit compared to the subjects whose measurements are equal or higher than visit mean value. For example, the mean HbA1c at visit 3 is 9. Therefore, Probability [The observation is missing at visit 3|Observed value<9]=3* Probability [The observation is missing at visit 3|Observed value≥9].
In order to check the performance of various multiple imputation methods in different scenarios, we created missingness in the complete datasets from moderate(~20%, 30 %) to high proportion (~50%) for varied sample sizes (n=50, 75, 120). If missingness occurs at any post baseline visit for a subject, then it was assumed that all the successive post baseline visits have missing observations. Next, to impute the missingness through multiple imputation methods, procedure proc mi of SAS software was employed. For each imputation procedure a SAS macro was developed to impute missing values in data set. In order of computational steps, first a complete data set was generated through simulation procedure and then mean and standard deviation were calculated. Thereafter, varied proportion of missingness was created in the dataset under MAR mechanism for assumed proportion of missingness. Further, this missing data set was imputed using various studied imputation methods and descriptive statistics were calculated for the last visit of patient, further called as estimated mean and standard deviation. Incomplete datasets were imputed five times [12]. Statistical measures to assess the performance of methods were bias (absolute) and mean square error (MSE) of estimated mean and standard deviation, as defined below: Further, we constructed 90% Confidence Interval (CI) for bias and mean square error using the range of 5 th and 95 th percentile. Similarly, when 30% of data was missing, the MI-regression showed improvement in bias and MSE of estimates with an increase in sample size. In other two multiple imputation methods, MI-propensity score method has scored less bias and MSE than MCMC method for both the moderate proportions of missingness (20% and 30%). Thus, in this scenario MCMC method has shown the lowest performance despite the increase in sample size. Now, when we check on LOCF method: a single imputation method, we observe that it has much higher bias and MSE for mean than multiple imputation methods, indicating that MI methods are better choice for discussed situation. It has been observed that much prevalent LOCF method far behind than multiple imputation methods. LOCF is widely used methods in clinical trial data analysis, but the study results made it very clear that their performance is the poorest even in the case of small missingness (20%). Figures 1A and 1B shows the performance of imputation methods, mean square error (MSE) of the MI-regression and propensity score methods can be observed decreasing with the lessening of missingness percentage in data.

Large proportion of missingness
In a situation when half of the data is missing in study and sample size is as small as 50 patients, result showed that MI-propensity is better than other two MI methods, but the difference with MI-regression methods was not much. also improved the performance with the increase in sample size, however the performance of MI-MCMC method did not improve much despite the increase in sample size in case of large proportion of missingness in data. Again, MI-regression method performed well in the presence of large proportion of missingness in dataset. It is also observed last observation carried forward method remains at the bottom in terms of comparative measures when missingness proportion is large. Figure 2 shows the performance of imputation methods for large sample size of patients, MSE of the MI-regression and propensity score methods can be observed decreasing with the lessening of missingness percentage in data.

Discussion
Missing data invariably occur during conduct of a clinical trial and considered as a major concern for longitudinal studies. Unfortunately, the occurrence of missingness is unavoid- able despite the carefulness in study design, conduct and preventive strategies. Due to complexity of methods while dealing with missing data analysis, researchers exclude the subjects with partial information missing on response variables [16]. Exclusion of subjects from a planned sample size directly impacts the power of study [17], especially when numbers of the subjects in study are already less or including more patients in study has significant financial implications. Multiple imputation methods have drawn researcher's attention in last decade due to their computational elegance and applicability across the wide spectrum of incomplete data problems [18]. Advancement of computers and availability of statistical softwares have given us the opportunity to apply complex and time taking analysis techniques efficiently and easily. However, the pace of development in methodology is not reflected in real data applications by its end users [18].
In the present work we have explored the efficiency and appropriateness of various multiple imputation methods under different possible circumstances, using the software Statistical Analysis System (SAS) Version 9.1.3. We evaluated the imputation methods for different sample sizes and varying missingness from moderate to large proportion. Based on theresults and recommendations, the use of appropriate MI method is expected in analysis and reporting of longitudinal clinical studies. In order to make an informed decision in presence of missing data, it is necessary to analyze the data with the available statistical framework to deal with missingness in clinical trial studies; this in return will add to the generalizability of the research findings. Due to easiness of procedure, single imputation methods have been the default choice by researchers, but the efficiency of this method has been under question [14]. In a longitudinal clinical trial where the treatment effect improves with time, LOCF method has beens conservative approach to measure the change over time. Simulation result suggests that LOCF method has the maximum bias and mean square error. The multiple imputation method offers an opportunity to make the best out of collected data from an experiment on human being. Though missing data in clinical trials has received a growing amount of attention, there are still some key misunderstandings regarding selection of imputation methods. Based on the performance of methos, researchers can choose the most appropriate MI method from the Table 1.
MI-regression method has performed better than other two MI methods. Hence, it can be applicable for various situations. MI-propensity method has also performed well but remains at the second position, notably the method is comparable for large proportion missingness in small sample size studies. Markov chain Monte Carlo (MCMC) method is far behind than other two MI methods, so method does not offer much applicability for the discussed situation. However, in general, all multiple imputations methods have performed better than single imputation method. It is also to be understood that performance of MI methods improves significantly with the increase of sample size.

Conclusions
In present investigation, we have explored the efficiency and appropriateness of various imputation methods under small size data sets with varying size of missingness. The results of the study indicate that MI-regression method is most appropriate method for the analysis of data in presence of missingness with considered sample size and missingness mechanism. It was also demonstrated that large proportion of missing data in study affect the performance of imputation technique. Study findings may have important implications, particularly for a limited sample size longitudinal study where varying proportions is encountered. As evident from the findings, single imputation may not be a suitable approach for missing data analysis; the reason is as simple as it does not incorporate the uncertainty of missingness in imputed value. The present study findings will help researchers to choose an imputation method according to their requirement and the achieved estimates will be more precise. Simultaneously, we also encourage investigators and readers to keep themselves abreast with new development of the multiple imputation methods for effective application of findings.

Limitations of the study
We have considered the data under MAR assumption on a single outcome of interest with specified mean and covariance structure in a diabetic clinical trial, but in practice the assumption of specified mean and covariance structure of missingness may differ. So, caution is needed while using the results from this simulation study when assumptions of the study differ significantly. Our study considerations for assumed sample sizes is based on ease of computational doi: 10.7243/2053-7662-2-9 procedure and commonly analyzed number of subjects for a treatment arm; to translate the findings in other therapeutic areas, a study with larger sample size with different missingness proportions can be taken. Further exploration with new known complex scenarios would also be beneficial.

List of abbreviations
MI: Multiple Imputation MCAR: Missing completely at random MAR: Misisng at random MNAR: Missing not at random