
Background: In many of clinical studies, a lot of explanatory variables are analyzed and it is concluded that all statistically significant variables are important. However, if the multiplicity of statistical tests is not considered, significant variables are determined only by chance. To demonstrate the risk of multiple hypothesis tests, multiple logistic regression models created by random numbers are simulated.
Methods: Variables y and x1~x30, which have 600 elements per variable, are created by numbers selected randomly from (0,1) in the re-sampling method. Variable y is defined as the objective variable and variables x1~x30 are defined as explanatory variables. Multiple logistic regression analysis is performed using those objective and explanatory variables. Wald tests are performed in the statistical model, and the number of statistically significant explanatory variables (p value<0.05) is counted. The series of analysis is repeated 1000 times, and the numbers of significant variables are summated.
Results: In the 1000 simulations, the number of significant explanatory variables is 0~8 per one analysis. The average number is 1.69, and the median number is 2. In 80.1 percent of all of the simulations, at least one or more explanatory variables become statistically significant. Fifty percent or more simulations in all, the explanatory variables of two or more are statistically significant.
Conclusions: When performing exploratory research using multivariable analysis, we must be fully aware that there is a risk of false significance by multiplicity.
Keywords: Linear models, logistic regression, multivariate analysis, research methodology