What Assumptions About the Population Should We Be Willing to Make if a Margin of Error Is Desired
J Hum Reprod Sci. 2012 Jan-April; five(1): 7–13.
Sample size interpretation and power analysis for clinical research studies
KP Suresh
Section of Biostatistics, National Institute of Animal Nutrition and Physiology, Bangalore, Republic of india
S Chandrashekara
1Department of Immunology and Reumatology, ChanRe Rheumatology and Immunology Center and Research, Bangalore, India
Received 2012 Feb xvi; Revised 2012 Feb sixteen; Accustomed 2012 Mar 7.
Abstruse
Determining the optimal sample size for a study assures an adequate ability to detect statistical significance. Hence, information technology is a disquisitional step in the pattern of a planned research protocol. Using too many participants in a report is expensive and exposes more number of subjects to procedure. Similarly, if study is underpowered, it volition exist statistically inconclusive and may make the whole protocol a failure. This paper covers the essentials in calculating ability and sample size for a variety of applied report designs. Sample size ciphering for single group mean, survey blazon of studies, 2 group studies based on means and proportions or rates, correlation studies and for case-control for assessing the chiselled consequence are presented in detail.
Central WORDS: Correlation, odds ratio, ability, prevalence, survey, proportions, sample size
INTRODUCTION
Clinical inquiry studies tin be classified into surveys, experiments, observational studies etc. They need to be carefully planned to attain the objective of the written report. The planning of a skilful inquiry has many aspects. Offset step is to ascertain the trouble and it should be operational. 2nd step is to ascertain the experimental or observational units and the appropriate subjects and controls. Meticulously, one has to ascertain the inclusion and exclusion criteria, which should take care of all possible variables which could influence the observations and the units which are measured. The written report pattern must be clear and the procedures are divers to the best possible and available methodology. Based on these factors, the report must have an adequate sample size, relative to the goals and the possible variabilities of the study. Sample must be 'big enough' such that the consequence of expected magnitude of scientific significance, to be also statistically significant. Same fourth dimension, It is important that the study sample should not be 'too large' where an effect of little scientific importance is nevertheless statistically detectable. In addition, sample size is important for economic reasons: An under-sized report tin be a waste of resources since it may not produce useful results while an over-sized study uses more resource than necessary. In an experiment involving human or animal subjects, sample size is a disquisitional ethical issue. Since an ill-designed experiment exposes the subjects to potentially harmful treatments without advancing cognition.[1,2] Thus, a primal footstep in the design of clinical enquiry is the ciphering of power and sample size. Power is the probability of correctly rejecting the cipher hypothesis that sample estimates (e.g. Mean, proportion, odds, correlation co-efficient etc.) does not statistically differ between study groups in the underlying population. Big values of ability are desirable, at least 80%, is desirable given the available resources and ethical considerations. Power proportionately increases as the sample size for study increases. Accordingly, an investigator can control the study power by adjusting the sample size and vice versa.[3,iv]
A clinical written report will be expressed in terms of an estimate of upshot, appropriate confidence interval, and P value. The confidence interval indicates the likely range of values for the truthful effect in the population while the P value determines the how probable that the observed effect in the sample is due to gamble. A related quantity is the statistical power; this is the probability of identifying an exact difference betwixt 2 groups in the study samples when one genuinely exists in the populations from which the samples were drawn.
Factors that bear on the sample size
The adding of an advisable sample size relies on choice of certain factors and in some instances on crude estimates. At that place are 3 factors that should be considered in calculation of appropriate sample size- summarized in Table i. The each of these factors influences the sample size independently, but it is of import to combine all these factors in club to go far at an advisable sample size.
Table ane
The Normal deviates for different significance levels (Type I error or Blastoff) for ane tailed and two tailed alternative hypothesis are shown in Table 2.
Table 2
The normal deviates for different power, probability of rejecting zero hypothesis when information technology is not truthful or ane minus probability of type Two mistake are in shown Table three.
Table 3
Study blueprint, upshot variable and sample size
Report design has a major impact on the sample size. Descriptive studies demand hundreds of subjects to give acceptable confidence interval for small effects. Experimental studies generally need lesser sample while the cantankerous-over designs needs one-quarter of the number required compared to a command group because every bailiwick gets the experimental treatment in cross-over written report. An evaluation studies in unmarried group with pre-mail service type of design needs half the number for a like report with a control group. A written report design with one-tailed hypothesis requires 20% lesser subjects compared to two-tailed studies. Not-randomized studies needs 20% more than subjects compared to randomized studies in lodge to accommodate confounding factors. Additional 10 - 20% subjects are required to allow adjustment of other factors such as withdrawals, missing information, lost to follow-upward etc.
The "consequence" expected under report should be considered. There are 3 possible categories of upshot. The first is a unproblematic instance where 2 alternatives exist: Yes/no, death/alive, vaccinated/non vaccinated, etc. The second category covers multiple, mutually sectional alternatives such as religious behavior or blood groups. For these 2 categories of outcome, the data are by and large expressed as percentages or rates[5–vii] The tertiary category covers continuous response variables such every bit weight, height, blood pressure, VAS score, IL6, TNF-a, homocysteine etc, which are continuous measures and are summarized every bit means and standard deviations. The statistical methods appropriates the sample size based on which of these outcomes measure is critical for the study, for example, larger sample size is required to appraise the categorical variable compared to continuous outcome variable.
Alpha level
The definition of blastoff is the probability of detecting a significant difference when the treatments are equally effective or chance of faux positive findings. The alpha level used in determining the sample size in most of bookish inquiry studies are either 0.05 or 0.01.[seven] Lower the alpha level, larger is the sample size. For example, a study with blastoff level of 0.01 requires more subjects when compared to a study with alpha level of 0.05 for like outcome variable. Lower alpha viz 0.01 or less is used when the decisions based on the research are critical and the errors may crusade substantial, fiscal, or personal harm.
Variance or standard divergence
The variance or standard deviation for sample size calculation is obtained either from previous studies or from airplane pilot study. Larger the standard departure, larger is the sample size required in a study. For case, in a report, with main consequence variable is TNF-a, needs more subjects compared to a variable of birth weight, 10-signal Vas score etc. as the natural variability of TNF-a is wide compared to others.
Minimum detectable deviation
This is the expected divergence or relationship between 2 independent samples, also known equally the event size. The obvious question is how to know the difference in a study, which is not conducted. If available, it may be useful to utilise the issue size found from prior studies. Where no previous study exists, the effect size is adamant from literature review, logical assertion, and conjecture.
Power
The deviation between 2 groups in a report volition be explored in terms of estimate of effect, appropriate confidence interval, and P value. The confidence interval indicates the likely range of values for the truthful outcome in a population while P value determines how likely it is that the observed effect in the sample is due to risk. A related quantity is the statistical power of the study, is the probability of detecting a predefined clinical significance. The platonic study is the ane, which has loftier power. This means that the written report has a high risk of detecting a difference between groups if information technology exists, consequently, if the study demonstrates no divergence between the groups, the researcher can reasonably confident in concluding that none exists. The ideal power for any study is considered to be eighty%.[eight]
In research, statistical power is by and large calculated with two objectives. one) Information technology tin be calculated earlier information drove based on information from previous studies to decide the sample size needed for the electric current report. ii) Information technology tin can also be calculated after data analysis. The 2nd situation occurs when the upshot turns out to be non-significant. In this instance, statistical ability is calculated to verify whether the non-significance outcome is due to lack of relationship between the groups or due to lack of statistical power.
Statistical power is positively correlated with the sample size, which means that given the level of the other factors viz. alpha and minimum detectable difference, a larger sample size gives greater ability. However, researchers should be clear to find a deviation between statistical difference and scientific difference. Although a larger sample size enables researchers to find smaller difference statistically significant, the difference plant may not be scientifically meaningful. Therefore, it is recommended that researchers must take prior idea of what they would look to be a scientifically meaningful difference before doing a power analysis and determine the actual sample size needed. Power analysis is at present integral to the health and behavioral sciences, and its use is steadily increasing whenever the empirical studies are performed.
Withdrawals, missing information and losses to follow-up
Sample size calculated is the total number of subjects who are required for the terminal report analysis. There are few practical issues, which need to be considered while calculating the number of subjects required. It is a fact that all eligible subjects may not be willing to accept part and may be necessary screen more subjects than the final number of subjects entering the report. In addition, even in well-designed and conducted studies, it is unusual to finish with a dataset, which is complete for all the subjects recruited, in a usable format. The reason could be bailiwick factor like- subjects may fail or refuse to requite valid responses to item questions, physical measurements may suffer from technical problems, and in studies involving follow-up (eg. Trials or cohort studies), there will be some caste of attrition. The reason could be technical and the procedural problem- similar contamination, failure to become the assessment or examination performed in time. It may, therefore, necessary to consider these problems before calculating the number of subjects to exist recruited in a study in guild to achieve the final desired sample size.
Example, say in a study, a total of N number of subjects are required in the end of the study with all the data being complete for assay, but a proportion (q) are expected to refuse to participate or driblet out before the study ends. In this case, the following full number of subjects (N1) would have to be recruited to ensure that the final sample size (N) is achieved:
, where q is the proportion of attrition and is generally 10%,
The proportion of eligible subjects who will refuse to participate or provide the inadequate information will be unknown at the beginning of the study. Approximate estimates is ofttimes possible using information from similar studies in comparable populations or from an appropriate pilot study.[9]
Sample size interpretation for proportion in survey type of studies
A mutual goal of survey research is to collect data representative of population. The researcher uses information gathered from the survey to generalize findings from a fatigued sample back to a population, within the limits of random error. The general rule relative to acceptable margins of error in survey research is 5 - 10%. The sample size tin exist estimated using the post-obit formula
Where P is the prevalence or proportion of event of interest for the study, Due east is the Precision (or margin of error) with which a researcher want to measure out something. Generally, E will be 10% of P and Zα/2 is normal deviate for two-tailed alternative hypothesis at a level of significance; for example, for 5% level of significance, Zα/2 is 1.96 and for 1% level of significance it is two.58 every bit shown in Table 2. D is the design event reflects the sampling design used in the survey type of study. This is ane for unproblematic random sampling and higher values (normally one to ii) for other designs such as stratified, systematic, cluster random sampling etc, estimated to compensate for deviation from simple random sampling procedure. The pattern event for cluster random sampling is taken every bit one.five to 2. For the purposive sampling, convenience or judgment sampling, D will cross x. Higher the D, the more volition be sample size required for a written report. Simple random sampling is unlikely to be the sampling method in an actual filed survey. If another sampling method such every bit systematic, stratified, cluster sampling etc. is used, a larger sample size is likely to be needed because of the "blueprint effect".[10–12] In instance of impact study, P may be estimated at 50% to reflect the assumption that an bear on is expected in 50% of the population. A P of 50% is also a conservative estimate; Example: Researcher interested to know the sample size for conducting a survey for measuring the prevalence of obesity in certain community. Previous literature gives the gauge of an obesity at 20% in the population to be surveyed, and assuming 95% conviction interval or 5% level of significance and 10% margin of error, the sample size can exist calculated every bit follow equally;
N = (Zα/two)2 P(1-P)*i / Due east2 = (i.96)ii*0.20*(1-0.20)/(0.1*0.20)2 = three.8416*0.16/(0.02)2 = 1537 for a simple random sampling design. Hence, sample size of 1537 is required to behave community-based survey to estimate the prevalence of obesity. Note-E is the margin of error, in the present example; information technology is 10% χ 0.20 = 0.02.
To discover the terminal adapted sample size, allowing non-response charge per unit of 10% in the above case, the adjusted sample size will be 1537/(one-0.10) = 1537/0.90 = 1708.
Sample size estimation with unmarried group mean
If researcher is conducting a study in single group such as outcome assessment in a group of patients subjected to certain handling or patients with detail type of illness and the primary outcome is a continuous variable for which the mean and standard deviation are expression of results or estimates of population, the sample size can be estimated using the following formula
Northward = (Zα/ii)2 southward2 / d2,
where s is the standard departure obtained from previous report or airplane pilot study, and d is the accuracy of estimate or how close to the truthful hateful. Zα/ii is normal deviate for 2- tailed alternative hypothesis at a level of significance.
Inquiry studies with 1 tailed hypothesis, above formula can be rewritten as
N = (Zα)two southward2 / d2, the Zα values are 1.64 and two.33 for five% and 1% level of significance.
Example: In a report for estimating the weight of population and wants the error of estimation to be less than 2 kg of true mean (that is expected departure of weight to be 2 kg), the sample standard deviation was 5 and with a probability of 95%, and (that is) at an error charge per unit of 5%, the sample size estimated as Due north = (i.96)2 (5)ii/ two2 gives the sample of 24 subjects, if the allowance of 10% for missing, losses to follow-up, withdrawals is assumed, then the corrected sample will exist 27 subjects. Corrected sample size thus obtained is 24/(one.0-0.ten) ≅ 24/0.9 = 27 and for 20% allowances, the corrected sample size will be 30.
Sample size interpretation with two ways
In a report with research hypothesis viz; Zippo hypothesis H o: m1 = thousand2 vs. culling hypothesis H a: yard1 = m2 + d where d is the difference between two means and n1 and n2 are the sample size for Group I and Group II such that Due north = n1 + n2. The ratio r = n1/n2 is considered whenever the researcher needs diff sample size due to various reasons, such equally upstanding, price, availability etc.
So, the total sample size for the study is as follows
Where Zα is the normal deviate at a level of significance (Zα is one.96 for 5% level of significance and 2.58 for 1% level of significance) and Z1-β is the normal deviate at 1-β% power with β% of blazon II fault (0.84 at eighty% power and ane.28 at ninety% statistical power). r = n1/n2 is the ratio of sample size required for 2 groups, mostly it is 1 for keeping equal sample size for 2 groups If r = 0.5 gives the sample size distribution as i:2 for 2 groups. σ and d are the pooled standard deviation and difference of means of 2 groups. These values are obtained from either previous studies of similar hypothesis or conducting a pilot study. Permit`due south usa say a clinical researcher wanting to compare the effect of two drugs, A and B, on systolic blood pressure level (SBP). On literature search, researcher institute the hateful SBP in 2 groups were 120 and 132 and common standard deviation of fifteen. The total sample size for the study with r = i (equal sample size), a = v% and power at 80% and 90% were computed equally and for 90% of statistical power, the sample size will be 32. In unequal sample size of i: 2 (r = 0.5) with ninety% statistical ability of 90% at five% level significance, the full sample size required for the written report is 48.
Sample size estimation with 2 proportions
In study based on result in proportions of event in two populations (groups), such as percentage of complications, bloodshed comeback, sensation, surgical or medical outcome etc., the sample size estimation is based on proportions of outcome, which is obtained from previous literature review or conducting airplane pilot study on smaller sample size. A written report with nix hypothesis of H o: π1 = π2 vs. H a: π1 = π2 + d, where π are population proportion and p1 and p2 are the respective sample estimates, the sample size can exist estimated using the following formula
Where p1 and p2 are the proportion of event of interest (outcome) for group I and group II, and p is Zα/two is normal deviate at a level of significance and Z1-β is the normal deviate at one-β% ability with β% of type II error, normally type II error is considered 20% or less.
If researcher is planning to conduct a study with unequal groups, he or she must calculate Due north as if we are using equal groups, and then calculate the modified sample size. If r = n1/n2 is the ratio of sample size in 2 groups, and so the required sample size is Northward 1 = North(one+r)2/4r, if n1 = 2n2 that is sample size ratio is 2:one for group i and group two, so Northward i = 9N/eight, a adequately minor increase in total sample size.
Case: It is believed that the proportion of patients who develop complications later undergoing 1 type of surgery is 5% while the proportion of patients who develop complications after a second type of surgery is 15%. How large should the sample be in each of the 2 groups of patients if an investigator wishes to discover, with a power of 90%, whether the second procedure has a complications rate significantly college than the kickoff at the 5% level of significance?
In the example,
-
a)
Test value of divergence in complication rate 0%
-
b)
Anticipated complexity rate five%, 15% in 2 groups
-
c)
Level of significance v%
-
d)
Power of the test ninety%
-
due east)
Alternative hypothesis(one tailed) (p1-pii) < 0%
The full sample size required is 74 for equal size distribution, for unequal distribution of sample size with 1.five:i that is r = i.5, the total sample size volition be 77 with 46 for grouping I and 31 for group 2.
Sample size estimation with correlation co-efficient
In an observational studies, which involves to judge a correlation (r) betwixt 2 variables of interest say, X and Y, a typical hypothesis of course H0: r = 0 confronting Ha:r ≠ 0, the sample size for correlation study can be obtained by computing
where Zα/2 and Z1-β are normal deviates for blazon I error (significance level) and Power of study [Tables two and iii].
Example: According to the literature, the correlation between salt intake and systolic blood pressure is around 0.30. A study is conducted to attests this correlation in a population, with the significance level of 1% and power of 90%. The sample size for such a study can exist estimated equally follows:
the sample size for 90% power at ane% level of significance was 99 for two-tailed alternative test and 87 for one-tailed test.
Sample size interpretation with odds ratio
In case-control written report, data are commonly summarized in odds ratio, rather than difference between two proportions when the result variables of interest were categorical in nature. If P1 and P2 are proportion of cases and controls, respectively, exposed to a adventure cistron, then:
if nosotros know the prevalence of exposure in the general population (P), the total sample size N for estimating an OR is where Zα/2 and Zi-β are normal deviates for blazon I error (significance level) and Power of study [Tables 2 and 3].
Example: The prevalence of vertebral fracture in a population is 25%. When the written report is interested to estimate the effect of smoking on the fracture, with an odds ratio of ii, at the significance level of 5% (one-sided test) and power of eighty%, the total sample size for the report of equal sample size can exist estimated past:
DISCUSSION
The equations in this paper assume that the selection of individual is random and unbiased. The decisions to include a subject in the study depend on whether or non that subject has the characteristic or the outcome studied. Second, in studies in which the mean is calculated, the measurements are causeless to take normal distributions.[13,14]
The concept of statistical power is more associated with sample size, the power of the study increases with an increment in sample size. Ideally, minimum power of a study required is 80%. Hence, the sample size calculation is critical and fundamental for designing a study protocol. Even after completion of written report, a retrospective power analysis will be useful, especially when a statistically not a pregnant results are obtained.[15] Here, actual sample size and alpha-level are known, and the variance observed in the sample provides an estimate of variance of population. The analysis of power retrospectively re-emphasizes the fact negative finding is a true negative finding.
The ideal report for the researcher is 1 in which the power is high. This means that the study has a high take a chance of detecting a difference between groups if i exists; consequently, if the study demonstrates no deviation between groups, the researcher tin can be reasonably confident in concluding that none exists. The Power of the study depends on several factors, only every bit a general rule, higher power is achieved by increasing the sample size.[16] Many apparently zero studies may exist under-powered rather than genuinely demonstrating no difference between groups, absence of show is non evidence of absence.[ix]
A Sample size calculation is an essential stride in enquiry protocols and is a must to justify the size of clinical studies in papers, reports etc. Withal, one of the most common mistake in papers reporting clinical trials is a lack of justification of the sample size, and it is a major concern that of import therapeutic effects are being missed because of inadequately sized studies.[17,eighteen] The purpose of this review is to make available a collection of formulas for sample size calculations and examples for variety of situations likely to be encountered.
Often, the inquiry is faced with various constraints that may strength them to use an inadequate sample size considering of both applied and statistical reasons. These constraints may include budget, time, personnel, and other resource limitations. In these cases, the researchers should report both the advisable sample size along with sample size actually used in the study; the reasons for using inadequate sample sizes and a discussion of the effect of inadequate sample size may take on the results of the study. The researcher should practice circumspection when making businesslike recommendations based on the enquiry with an inadequate sample size.
Determination
Sample size determination is an important major pace in the design of a research study. Appropriately-sized samples are essential to infer with conviction that sample estimated are cogitating of underlying population parameters. The sample size required to decline or accept a written report hypothesis is determined by the power of an a-test. A study that is sufficiently powered has a statistical rescannable chance of answering the questions put forth at the beginning of enquiry study. Inadequately sized studies oft results in investigator's unrealistic assumptions about the effectiveness of study treatment. Misjudgment of the underlying variability for parameter estimates wrong gauge of follow-up period to observe the intended effects of the treatment and inability to predict the lack of compliance of the study regimen, and a high drop-rate rates and/or the failure to account for the multiplicity of report endpoints are the common error in a clinical research. Conducting a study that has little hazard of answering the hypothesis at hand is a misuse of time and valuable resources and may unnecessarily expose participants to potential impairment or unwarranted expectations of therapeutic benefits. As scientific and ethical issue go hand-in-manus, the awareness of determination of minimum required sample size and awarding of appropriate sampling methods are extremely of import in achieving scientifically and statistically audio results. Using an adequate sample size forth with high quality data collection efforts will issue in more reliable, valid and generalizable results, information technology could too event in saving resources. This paper was designed equally a tool that a researcher could use in planning and conducting quality research.
Footnotes
Source of Back up: Nix
Conflict of Involvement: None declared.
REFERENCES
one. Shuster JJ. Boca Raton, FL: CRC Press; 1990. Handbook of sample size guidelines for clinical trials. [Google Scholar]
2. Altman DG. London, UK: Chapman and Hall; 1991. Applied statistics for Medical Research. [Google Scholar]
3. Wittes J. Sample size calculations for randomized controlled trials. Epidemiol Rev. 2002;24:39–53. [PubMed] [Google Scholar]
4. Desu Thou, Raghavarao D. Boston, MA: Academic Press, Inc; 1990. Sample size methodology. [Google Scholar]
v. Agresti A. New York: John Wilely and Sons; 1990. Chiselled data analysis. [Google Scholar]
6. Lwanga SK, Lemenshow S. Geneva: Globe Health Organization; 1991. Sample size determination in health studies.A Practical transmission; pp. ane–3. [Google Scholar]
7. Fleiss JL. 2nd ed. New York, NY: Wiley; 1981. Statistical methods for rates and proportions; p. 45. [Google Scholar]
8. Hintze JL. Kaysville, Utah, USA: 2008. Ability analysis and sample size system (PASS) for windows User's Guide I. NCSS. [Google Scholar]
10. James EB, Joe WK, Chadwick CH. Organizational Research: Determining appropriate sample size in survey research. Inf Technol Acquire Operation J. 2001;19:43–50. [Google Scholar]
eleven. Johnson PO. Development of the sample survey as a scientific methodology. J Exp Educ. 1959;27:167–76. [Google Scholar]
12. Wunsch D. Survey research. Determining sample size and representative response. Passenger vehicle Educ Forum. 1986;40:31–4. [Google Scholar]
xiii. Lachin JM. Introduction to sample size conclusion and power analysis for clinical trials. Control Clin Trials. 1981;ii:93–113. [PubMed] [Google Scholar]
xiv. Donner A. Approaches to sample size interpretation in the blueprint of clinical trials - A review. Stat Med. 1984;3:199–214. [PubMed] [Google Scholar]
15. Thomas L, Juanes F. The importance of statistical ability analysis: An example from animal behavior. Anim Behav. 1996;52:856–9. [Google Scholar]
16. Mohar D, Dulbarg CS. Statistical ability, sample size, and their reporting in randomized controlled trials. JAMA. 1994;272:122–4. [PubMed] [Google Scholar]
17. Campbell MJ, Julious SA, Altman DG. Estimating sample sizes for binary, ordered categorical and continuous outcomes in two groups comparisons. Br Med J. 1995;311:1143–8. [PMC complimentary commodity] [PubMed] [Google Scholar]
18. Thomas L. Retrospective ability analysis. Conserv Biol. 1997;11:276–80. [Google Scholar]
Articles from Periodical of Human Reproductive Sciences are provided hither courtesy of Wolters Kluwer -- Medknow Publications
Source: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3409926/