
Meysamie A, Taee F, Mohammadi-Vajari M-A, Yoosefi-Khanghah S, Emamzadeh-Fard S and Abbassi M. Sample size calculation on web, can we rely on the results? J Med Stat Inform. 2014; 2:3. http://dx.doi.org/10.7243/2053-7662-2-3
Alipasha Meysamie1*, Farough Taee1, Mohammad-Ali Mohammadi-Vajari1, Siamak Yoosefi-Khanghah1, Sahra Emamzadeh-Fard1 and Mehrshad Abbassi2
*Correspondence: Alipasha Meysamie Meysamie@tums.ac.ir
1. Department of Community and Preventive Medicine, Medical Faculty, Tehran University of Medical Sciences, Iran.
2. Department of Nuclear Medicine, Valiasr Hospital, Tehran University of Medical Sciences, Tehran, Iran.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
One of the most frequent concerns of the researches is sample size calculation. There are many equations to calculate sample size one should be selected correctly based on the type of the study and data type. Any mistake can mislead the researcher and so the study results. Even an overestimation can cause waste of cost, time and material. Either statisticians or other researchers frequently use online advice on sample size calculation. Online sample size calculators have been searched, reviewed and checked for the calculated results based on known formulas for common research objectives. Considering the most common types of health study objectives regarding sample size calculation (estimating proportion or mean, comparison of two proportions or two means), from 60 website or software, only 5(8.3%) provided all four types of calculations. Overall exact calculation rate was 8.3% and 16.1% per calculation and per site, respectively. Many of the sites just calculate sample size for estimating proportions and most of the results lack the accuracy in calculation.
Keywords: Online calculation, sample size, health studies, accuracy
The statisticians, methodologists and non-statistician professional researchers frequently ask this question "how many subjects do I need for my study?" [1]. The sample size affect the feasibility of the research and proper calculation possess significant effect on the power and validity of the results [2]. Currently, the researchers solve most of their problems via online solutions and face to face consultations are limited for the must to do occasions. For internet assisted sample size calculation, the first step is to find a proper site with online calculation capability or downloadable files but the accuracy of calculations is one of the most important matters. There are many online sample size calculation sites with different specifications. Non-statistician users do not like sophisticated processes for calculation and more professional statisticians may need advanced calculations for somehow rare study types. In this review the sample size calculation websites have been listed and tried to be checked how their calculations are the same as basic formulas show.
Google search engine have been used with keywords including sample size calculator, sample size calculation, sample size in health studies, sample size web site and sample size software for retrieving related sites. Software informer website (available at http://software.informer.com) also used to find downloadable files for sample size calculation. Each site reviewed and data concerning URL, owner, availability of calculation equation(s), providing other types of calculations rather than only sample size calculation e.g. confidence levels or statistical tests collected. Standard formulas for sample size calculation extracted considering the four most common types of sample size calculation in health studies for estimating proportion of a qualitative variable, a mean of a quantitative variable and comparison of two proportions and two means of two different populations according to detailed review of several statistical resources, such as some biostatistics textbooks [3-8] and articles [9]. Four hypothetical examples considered for calculations (one for each) and the results based on selected formulas calculated (Table 1). Then the results of each site or software compared with the standard results. A difference more than rounding the final results (more than 1) with the standard result, considered as inaccurate (not the same as formula) calculation. The feasibility of site usage also evaluated to determine whether the application of sites was simple and user friendly or professional. To evaluate the complexity, two medical researchers who had some experience in medical research and sample size calculation asked to use sample size calculators to score them as easy, moderate or advanced to use. The sites scored as easy if they required basic input data for calculation and instructed each phrase of calculation clearly for users; the sites which needed detailed information about the study e.g. one tailed or two tailed calculations considered as moderate; and the sites which required very detailed and additional information which were not familiar to routine practices in medical research according to the basis of statistical knowledge as a medical researchers e.g. complex survey analysis or cost analysis considered as advanced. The congruent rating of both observers for sites considered as the ease of use level and if there was any incongruence between their assessment, a third evaluator considered the site review for rating.
Table 1 : Different types of formula for sample size calculation and hypothetical data and results used for assessing the calculation accuracy (z=value on standardized normal distribution for giving probability).
Two sites could not be rated because of unavailability of registration or access for authors [11,52] and some had no calculation regarding to the four common ways considered in evaluation process [12,17,18,27,31,34,52,55,57] (Table 2). STATA software also used for calculations and related results depicted [59].
Table 2 : Brief description of sample size calculator websites and software, NR=Not Rated, NAP=Not Applicable, NC=No calculation provided.
The summary of available sample size calculator sites is presented in (Table 2). Among the web-based sample size calculators, some do the statistics online and other provides a downloadable file for calculations [10,11,21,22,24,26,34,53, 57,58]. These files vary from a static table in excel [21] or dynamic file for multiple calculations [10]. Most of the sites (25 out of 50) just calculate sample size for estimation of a proportion, and most of them consider a proportion of 50% as a fixed number in formula for calculation and so the results were not accurate for considered example in the study. Only five sites provided all four common types of calculations [16,23,33,45,46]. Considering calculations in comparison with the results of formulas, from 57 different calculations, 18(31.6%) were accurate and based on the sites, only 13(26.0%) sites provides at least one accurate result. From those providing all four types of calculations only one had accurate results for all four calculations based on formulas [23] and the other one [16] provided accurate results for 3 calculations. Some sites provided calculations for specific studies e.g. complex sample surveys [49,57], survival analysis [26,55] and microarray experiments in genetic studies [12] and some provided multi language support [23,37]. From 48 rated sites 28(58.3%) scored as "Easy", 6(14.6%) as "Moderate" and 10(27.1%) as "Advanced" according to the complexity of the site usage.
We provided a list of available online sample size calculating sites with brief description and an evaluation of their accuracy in calculations, besides the level of ease of use and complexity.
Online sample size calculators vary from easy and userfriend to advanced and professional, but what is at the most important matter is "How valid are their calculations?". According to our findings, there are few sites which provides all common ways of sample size calculation needed for common types of researches and only a few provides accurate calculations. The most important reason for inaccurate results may be due to some fix considerations in many sites e.g. considering a fix proportion of 50% for calculation of sample size in studies when the aim is estimation of prevalence. Some other reasons may be different formulas used in sites rather those we considered in our assessment. It seems providing basic formulas for each calculation, may help researchers to choose more accurate path for sample size calculation. Although in some conditions e.g. comparison of two proportions different formulas will provide almost the same results for sample size calculation. In other conditions like estimation there are somehow more different formulas for sample size calculations and so more apart results. This has been shown in our study that some sites used the formula which considered type II error for sample size calculation for estimation a proportion or a mean [33]. Some sites may use more accurate formulas for calculation but others use inaccurate formulas. For example some sites considers formulas using just one standard deviation for calculation in case of two mean comparison [24,26,46] and some uses variances instead of standard deviations for calculations [16] this can cause potential errors if researchers use standard deviations instead.
Some other sites provided advanced calculations for complex studies, [12,26,49,55,57] in our study we did not rate and assess their calculations but when in simple ways of calculations there are so many faults how we can expect accurate results from those complex sites. Providing multi language support may be an ease of use for users with different languages other than English language. In our review we found two sites which provided multi language support [23,37].
Some good references for reviewing sample size calculation principles are used in this study and can be helpful for researchers [7,8]. Special statistical software like STATA, SAS and R also provides commands for sample size calculation and power analysis which can be useful for sophisticated approach.
It seems that those sites which provide calculating formulas can be more trusted for online sample size calculation and at least there will be a way for researchers to check the calculation accuracy based on the provided formulas. Although researchers who perform sophisticated studies need special assistance of statisticians for sample size calculation in their studies.
The authors declare that they have no competing interests.
Authors' contributions | AM | FT | MMV | SYK | SEF | MA |
Research concept and design | √ | √ | -- | -- | -- | -- |
Collection and/or assembly of data | √ | √ | √ | √ | √ | -- |
Data analysis and interpretation | √ | √ | √ | √ | √ | √ |
Writing the article | √ | √ | √ | √ | √ | √ |
Critical revision of the article | √ | -- | -- | -- | -- | √ |
Final approval of article | √ | -- | -- | -- | -- | -- |
Statistical analysis | √ | √ | √ | √ | -- | -- |
The authors thanks to Tehran University of Medical Sciences for providing the facilities to run an online sample size calculator and for supporting the study.
Editor: Jimmy Efird, East Carolina University, USA.
Received: 08-Jan-2013 Final Revised: 24-Feb-2014
Accepted: 26-Feb-2014 Published: 14-Apr-2014
Meysamie A, Taee F, Mohammadi-Vajari M-A, Yoosefi-Khanghah S, Emamzadeh-Fard S and Abbassi M. Sample size calculation on web, can we rely on the results? J Med Stat Inform. 2014; 2:3. http://dx.doi.org/10.7243/2053-7662-2-3
Copyright © 2015 Herbert Publications Limited. All rights reserved.