Strategies for Minimizing Sample Size and Maximizing Power
When the estimated sample size is greater than the number of subjects that can be studied realistically, what can be done?
First, check your calculations!
Second, review the ingredients. Is the detectable difference or effect size unreasonably small or the variability unreasonably large? Could alpha or beta or both be increased without harm? Is the confidence level too high or the interval unnecessarily narrow?
If these fail, here are other strategies:
Use continuous variables instead of dichotomous variables, if this is an option. There is more information in a continuous variable and so you get greater power for a given sample size or a smaller sample size for a given power.
Use paired measurements – this reduces the betweensubject part of the variability of the outcome variable.
Use more precise variables, perhaps by taking duplicate measurements or refining the measurement tool.
Use unequal group sizes, if it is easier to recruit in one group than another (e.g. casecontrol).
Use a more common outcome, that is, one with a frequency closer to 50% than to 0% or 100%.
Summary
If you have jumped ahead to this point in the module, I would encourage you to return to where you left off and go through the material in detail. I have worked hard to present all the various considerations and situations and provide sound advice in how to apply the rules.
But, if you have limited time, a limited attention span, or a limited capacity for quantitative thinking (heaven forbid!), here is an “executive summary”. But, promise me that you WILL estimate the sample size early in the design phase.
First, don’t be awed by the formulas and the apparent precision of the numbers arising from the sample size calculations. All the ingredients are really uncertain and crudely estimated. And the choice of 5% significance level and 80% power are somewhat arbitrary definitions for the vague concepts of “small” or “large”.
For the difference between two means, use Lehr’s (1992) “Sixteen ssquared over dsquared” rule, a rule which should never be forgotten. In this phrase, “s” is the common standard deviation of the two groups and “d” is the difference between the two means. The rule sounds friendlier since it replaces the Greek letters σ (sigma) and δ (delta) by s and d.
Note that if you double the difference (d) you want to detect, the sample size is cut by a factor of four. If you double the standard deviation (s), the sample size goes up by a factor of four. Streiner and Norman observe that plausibly small adjustments in the initial estimate can have big effects on the calculation! Which is why statisticians are so successful at making the calculated sample size exactly equal the number of available patients.
For difference among many means, pick the two means you really care about and then apply Lehr’s rule to get the sample size for each group.
For the difference between proportions use N = 16 p(1  p) / (p_{0}  p_{1})^{2} where p = (p_{0} + p_{1})/2.
Hulley, Chapter 6, has some useful sample size estimation tables.
Note: Sample size calculations should be based on the way the data will be analyzed. But, even if more complex methods of analysis will be used ultimately, it is easier and usually sufficient to estimate the sample size assuming a simpler method of analysis, such as the ttest or two means or chisquare test of two proportions.
Warnings:

Remember to plan for dropouts and for subjects with missing data.

Make sure you know whether the formulas you used are for equal sample sizes.

For paired measurement scale (i.e. continuous) data, use the standard deviation of the change scores, not of the variable itself.

Be aware of clustered data and the unit of analysis.
Finally, remember that approximations for various ingredients in the sample size formulas that are based on educated guesses by the investigator will probably work fine. The process of thinking through the problem and imagining the findings that will result is what sample size planning is all about. Carry on!
Sample Size Calculators on the Web
In increasing order of complexity:
http://newton.stat.ubc.ca/~rollin/stats/ssize/
Rollin Brant’s simple but effective sample size calculators
http://hedwig.mgh.harvard.edu/sample_size/size.html
Deals with epidemiological applications.
http://www.stat.uiowa.edu/~rlenth/Power/index.html
Nicely designed javaapplets
http://www.utexas.edu/its/rc/world/stat/online.html
An excellent index page for many other online statistical tools.
Other sample size contexts:
http://www.georgetown.edu/faculty/ballc/webtools/web_chi.html
A sample size calculator for chisquare tests
http://www.surveysystem.com/sscalc.htm
http://www.researchinfo.com/docs/calculators/index.cfm
These two are surveyoriented websites.
http://www.martindalecenter.com/Calculators.html
A monstrously large site with every conceivable calculator listed. A “fun” site to browse.
Examples
At Kyle and Effie’s school (Belltown Elementary), the school nurse has noticed that the students appear to be on the lower side of population growth charts, especially for height. Anecdotal evidence suggests that the children are not consuming enough milk at home; there doesn’t seem to be enough room in the refrigerators after all the beer cans have been stored. She is considering instituting a milk supplement program to improve growth rates. Her study will be a twogroup randomized clinical trial, where half the children (chosen at random) will be given extra milk every day for a year. At this age, children’s height gain in 12 months has a mean of about 6 cm with a standard deviation of 2 cm. An extra increase in height of 0.5 cm in the milk group would be considered an important difference. How large a study should she do?
To have 80% power of detecting this difference (at 5% significance level, twotailed) she would need, using Lehr’s rule, N = 16 x (2 / 0.5)^{2 } = 256 per group. Thus approximately 500 children would be needed for the study.
Calculator example to do this same thing:
In addition to the inschool milk supplement program, the nurse would like to increase the use of daily vitamin supplements for the children by visiting homes and educating about the merits of vitamins. She believes that currently about 50% of families with schoolage children give the children a daily megavitamin. She would like to increase this to 70%. She plans a twogroup study, where one group serves as a control and the other group receives her visits. How many families should she plan to visit?
To have 80% power of detecting this difference (at 5% significance level, twotailed) she would need N = 16 (0.60)(1 – 0.60) / (0.50 – 0.70)^{2} = 96 per group. That is, she should plan to visit 100 families and have another 100 families in the control group.
Calculator example to do this same thing:
These screencaptures are from:
http://newton.stat.ubc.ca/~rollin/stats/ssize/
Rollin Brant’s simple but effective sample size calculators
Share with your friends: 