Tomoyoshi Tsuchiya

Correspondence: Tomoyoshi Tsuchiya ttom@shimizuhospital.com

**Author Affiliations**

Department of Respiratory Medicine, Shizuoka city Shimizu Hospital, 1231 Miyakami, Shimizu-ku, Shizuoka, Japan.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

**Background:** In many of clinical studies, a lot of explanatory variables are analyzed and it is concluded
that all statistically significant variables are important. However, if the multiplicity of statistical tests is
not considered, significant variables are determined only by chance. To demonstrate the risk of multiple
hypothesis tests, multiple logistic regression models created by random numbers are simulated.

**Methods: **Variables y and x1~x30, which have 600 elements per variable, are created by numbers selected
randomly from (0,1) in the re-sampling method. Variable y is defined as the objective variable and variables
x1~x30 are defined as explanatory variables. Multiple logistic regression analysis is performed using those
objective and explanatory variables. Wald tests are performed in the statistical model, and the number of
statistically significant explanatory variables (p value<0.05) is counted. The series of analysis is repeated
1000 times, and the numbers of significant variables are summated.

**Results:** In the 1000 simulations, the number of significant explanatory variables is 0~8 per one analysis.
The average number is 1.69, and the median number is 2. In 80.1 percent of all of the simulations, at least
one or more explanatory variables become statistically significant. Fifty percent or more simulations in all,
the explanatory variables of two or more are statistically significant.

**Conclusions: **When performing exploratory research using multivariable analysis, we must be fully aware
that there is a risk of false significance by multiplicity.

**Keywords:** Linear models, logistic regression, multivariate analysis, research methodology

In a great number of clinical studies, the multiple logistic regression model is used to investigate the prognostic factors by an exploratory method. In many of these studies, a lot of explanatory variables are analyzed and it is concluded that all statistically significant variables (p value<0.05) are important. However, if the multiplicity of statistical tests is not considered, significant variables are determined only by chance and a risk arises of incorrect conclusions.

When mean values of more than three groups are compared, adjusting of multiple comparisons is commonly taken into consideration (i.e., Bonferroni procedure, Holm procedure). However, multiple hypothesis testing in the multiple logistic regression model is not discussed at all. It is possible that explanatory variables, which are described as important factors in published papers, are in fact meaningless. It is common in clinical research for a lot of factors associated with the development of certain diseases, for example, gender, age, smoking history, hypertension and so on, to be examined by performing multiple hypothesis tests. In such instances, by multiplicity, the factors that have p values actually much larger than 0.05 may be judged as significant.

If we test only one null hypothesis using 0.05 as cut off point
of significance, it is correct to regard a p value less than 0.05 as
statistically significant. However, if we concurrently test two
independent null hypotheses, the probability that at least
one will be significant is 1-(1-0.05)x(1-0.05)=0.098, not 0.05. If
we test 10 such hypotheses, the probability that at least one
of those will be significant is 1-(1-0.05)^{10}=0.40, which is much
larger than 0.05. Generally, if we perform k independent significant tests with the cut-off point 0.05, the probability that
at least one of k hypotheses will be significant is 1-(1-0.05)^{k}.
In this study, for the purpose of demonstrating the risk of
multiple hypothesis tests, I simulated multiple logistic regression
models created by random numbers.

Variables, y and x1~x30 which have 600 elements per variable, are created by numbers selected randomly from (0,1) in the re-sampling method (Table 1). Variable y is defined as the objective variable and variables x1~x30 are defined as explanatory variables, and then multiple logistic regression analysis (a generalized linear model with logit link function and binomial error structure) is performed using the objective and explanatory variables. If the probability of an interested event is p, the odds is defined as p/(1-p). The multiple logistic regression model is given by:

Table 1 **: Data set image created by numbers selected randomly
from (0,1) in the re-sampling method. Variables, y and x1~x30
have 600 elements per variable.**

where β_{i} indicate the partial regression coefficients associated
with the reference (β_{0} is the intercept) and x_{i} indicate explanatory
variables.

This is the same procedure commonly described in medical research papers as 'We investigated using a multiple logistic regression analysis of 30 factors within 30 days of death in 600 cases of a certain syndrome'.

Conducting a Wald test on the partial regression coefficient of multivariable analysis, the number of statistically significant explanatory variables (p value<0.05) is counted. The series of analysis is repeated 1000 times, and the numbers of significant variables are summed.

All of the analyses are conducted using R version 3.1.0 (R
Core Team (2014). R: A language and environment for statistical
computing. R Foundation for Statistical Computing, Vienna,
Austria, http://www.R-project.org/). The R script used in
this simulation is shown in the **Supplement Data** script list.

One output example of the multiple logistic regression analysis of 1000 simulations is shown in Table 2. In this example, four of the 30 explanatory variables are determined to be significant (p value<0.05). The analysis processes are repeated 1000 times in the same way, and 1000 analysis results are outputted then summated.

Table 2 **: One-sample outputs of 1000 simulations. There are four
significant variables in this analysis.**

In those 1000 simulations, the number of significant explanatory variables (p value<0.05) is 0~8 per one simulation (Figure 1). The average number is 1.69, and the median number is 2. In 80.1 percent of all of the simulations, at least one or more explanatory variables become statistically significant. In 50% or more simulations, two or more of the explanatory variables are statistically significant.

Figure 1 : **Distribution of the number of statistically significant
variables.**

In order to confirm that the simulation models are created from random numbers, histograms of the adjusted odds ratio for all explanatory variables (30x1000=30000) and p values for all explanatory variables (30x1000=30000) are developed (Figures 2 and 3). With the histogram of the adjusted odds ratio, about 1.0 is the highest, and there are few values by chance alone that will be 2.0 or more and 0.5 or less. The histogram of the p value shows a uniform distribution.

Figure 2 : **Histogram of all adjusted odds ratios (30000)
computed in this simulation.**

Figure 3 : **Histogram of all p values (30000) computed in this
simulation.**

The current study demonstrates the risk of multiple hypothesis tests in exploratory clinical research. There are a few medical papers about this point [1-3], but these papers are not necessarily easy for clinicians to understand. Freedman showed that in multiple linear regression analysis using data created from random numbers, significant variables emerge from pure noise [4]. To develop upon that concept I present for clinicians the risk of multiple hypothesis tests in a visible manner with multiple logistic regression analysis, which is commonly used in medical research. Researchers are preoccupied with demonstrating statistical significance for publication, and may often lose the essence of their research.

As indicated in this study, statistically significant variables can be calculated using only noise, i.e., completely random data. For 30 variables, one or more variables are significant with a probability of about 80% from chance alone. Incidentally, a maximum of eight variables may be statistically significant with completely random data. If dummy-coded multiple categorical data are used, 30 explanatory variables is not such a large number. When evaluating only one null hypothesis in confirmatory study, there is no problem. However, when by evaluating a lot of null hypotheses in exploratory research, one must question whether statistically significant factors are really meaningful or not. A chance of 1 in 20 times happens quite often. Researchers are misunderstood in many cases. Confounding can be corrected by the multivariable analysis, but the chance of random is not corrected for. Important results and falsely important results by random chance are mixed in the same analysis and it is not always easy to distinguish between them.

There are several ways to avoid the risk of multiple hypothesis tests. These are described in "Evaluating Clinical and Public Health Interventions" [5] as follows: (1) In accordance with the number of null hypotheses, we will adjust the significance level by the method of Bonferroni and Holm. (2) We will determine how many factors to analyze before starting the study, and will determine the main outcome. There is no need to adjust testing the primary outcome; but it is necessary to adjust secondary outcome(s). (3) We will describe clearly all the results of tests in the paper. By doing so, the readers can interpret the results properly, without adjustment. (4) We should not be swayed by the p value itself. It is not important whether p value of the test is a little under or a little over the significance level. We must sufficiently investigate the biological plausibility, consistency with findings of other studies, effect size, and so on, without being too interested in p value itself. In addition to the above, the simulation in this study indicated that an adjusted odds ratio is almost never greater than 2.0 or less than 0.5 by chance alone. Therefore, if the adjusted odds ratio exceeds this range, the results of the test are highly likely to be important.

Performing multiple logistic regression analysis with 30 explanatory variables, one or more variables are significant with a probability of about 80% from chance alone. A maximum of eight of 30 variables may be significant from chance alone. When performing exploratory research using multivariable analysis, it is necessary for us to be fully aware of the risk shown in this study. In interpreting the results, we should try to reduce the risk of multiple hypothesis tests.

**Additional files**

**Supplement Data**

The author declares that he has no competing interests.

I thank the editors and reviewers for their helpful comments. This research received no specific grant from any funding agency in the public, commercial or not-for-profit sectors.

Editor: Qiang Shawn Cheng, Southern Illinois University, USA.

EIC: Jimmy Efird, East Carolina University, USA.

Received: 10-Jul-2014 Final Revised: 31-Aug-2014

Accepted: 08-Sep-2014 Published: 17-Sep-2014

- Smith DG, Clemens J, Crede W, Harvey M and Gracely EJ.
**Impact of multiple comparisons in randomized clinical trials**.*Am J Med.*1987;**83**:545-50. | Article | PubMed - Mills JL.
**Data torturing**.*N Engl J Med.*1993;**329**:1196-9. | Article | PubMed - Berry D.
**Multiplicities in cancer research: ubiquitous and necessary evils**.*J Natl Cancer Inst.*2012;**104**:1124-32. | Article | PubMed - David A Freedman.
**A note on screening regression equations**.*The American Statistician*. 1983;**37**:152-5. | Article - Mitchell H. Katz.
**How do I adjust for multiple comparisons? Evaluating Clinical and Public Health Interventions: A Practical Guide to Study Design and Statistics**. 2010; 140-2.

Volume 2

Tsuchiya T.** Risk of performing multiple logistic regression analysis without considering multiplicity: an overview for clinicians and practitioners**.* J Med Stat Inform*. 2014; **2**:7. http://dx.doi.org/10.7243/2053-7662-2-7

View Metrics

Copyright © 2015 Herbert Publications Limited. All rights reserved.

Post Comment|View Comments