One last concept: the role of variability in power and sample size… Streiner and Norman state: “Nearly all statistical tests are based on a signal-to-noise ratio, where the signal is the important relationship and the noise is a measure of individual variation.”
If a measurement scale outcome variable has little variability it will be easier to detect change than if it has a lot of variability. So assessments or estimates of variability (i.e. standard deviation) are an important ingredient in sample size estimation.
We are now ready to tackle sample size estimation for comparative studies.
For Comparative Studies…A Second Set of Questions Before “THE QUESTION”
This time there are FOUR questions that need to be answered before sample size can be calculated. The first two of these additional are easy; the last two are hard.
Question 1: What is an acceptable significance level (alpha)? Convention chooses .05 (or, if you like percentages more than proportions, 5%), but some situations dictate a different choice of alpha. Alpha is also know by another name; it is the probability of making a Type I error (discussed earlier in Review of Hypothesis Testing).
Why 5%? Sir Ronald Fisher suggested this as an appropriate threshold level. However, he meant that if the p-value from an initial experiment were less than .05 then the REAL research should begin. This has been corrupted to such an extent that at the first sign of a p-value under .05 the researchers race to publish the result!
Question 2: How large a power (i.e. probability of detection). Convention chooses power of .80 or 80%. Note that this assumes that the risk of a Type II error can be four times as great as the risk of a Type I error.
Why 80%? According to Streiner and Norman, this was because “Jacob Cohen [who wrote the landmark textbook on Statistical Power Analysis] surveyed the literature and found that the average power was barely 50%. His hope was that, eventually, both α and β would be .05 for all studies, so he took β = .20 as a compromise and thought that over the years, people would adopt more stringent levels. It never happened.”
Question 3: How large will be the variability in estimating the effect or difference of interest? For measurement outcome variables this means estimating the population standard deviation.
Question 4: What is the smallest effect or non-null difference that the researcher wants to detect? That is, what is the magnitude of the clinical difference of interest? Low magnification on a microscope may fail to detect something. Too high a magnification may make unimportant details look large. Finding a needle in a haystack is difficult, but finding an elephant in a haystack is comparatively easy!
The answers to Questions 3 and 4 are the key ingredients of the formulas for sample size estimation.
Sample Size Estimation; Answering THE QUESTION!
The basic formulas require the four components discussed above: