Journal of Medical Statistics and Informatics

Journal of Medical Statistics and Informatics

ISSN 2053-7662
Original Research

On comparative performance of multiple imputation methods for moderate to large proportions of missing data in clinical trials: a simulation study

Sukhdev Mishra1* and Diwakar Khare2

*Correspondence: Sukhdev Mishra

1. Division of Bio-Statistics, National Institute of Occupational Health, Meghani Nagar, Ahmedabad, India.

Author Affiliations

2. Department of Statistics, Institute of Social Sciences, Dr. B. R. Ambedkar University, Agra, India.


Background: Longitudinal clinical trial has measurements at successive occasions, and unavailability of patient at a scheduled visit causes missingness in expected full sequence of the measurements. Missing data is a major concern during the conduct of a clinical trial. It has been noted that missing data are not handled properly during final analysis which may considerably bias the results of analysis, reduce the power of the study and lead to invalid conclusions. A promising approach to handle this problem is to impute the missing values.

Methods: Multiple imputation (MI) methods provide a useful strategy to deal with the data sets with missing values, where missing values are filled in by estimate and the resulting data sets are analyzed by complete data methods. Statistical methods to address missingness have been actively pursued in recent years. This paper has attempted to provide a description of missing data mechanism and various imputation techniques for missing data analysis in longitudinal clinical trials. Further, the appropriateness of multiple imputation methods has been discussed under moderate to large proportion of missingness in a simulated clinical trial data, by comparing the various performance measures derived through intensive simulation procedure.

Results: For moderate proportion (~20 & 30%) of missingness MI-regression method scored minimum bias and MSE with increase in the sample size. However, other methods did not improve much despite increased sample size. For large proportion (50%) of missing data, MI-regression and MI-propensity score methods were close in performance but MI-regression method performed significantly well with increased number of subjects in the dataset.

Conclusions: Present investigation showed that MI-regression method is most appropriate for the analysis of data in presence of missingness with discussed sample size and missingness mechanism. Overall, the study findings will help researchers having limited knowledge of statistical methodology to choose a multiple imputation method accordingly, so that achieved estimates will be more precised.

Keywords: Missing data, missing mechanism, longitudinal data, multiple imputation

ISSN 2053-7662
Volume 2
Abstract Download