The essential reasoning of a significance test is as follows. Suppose for the sake of argument that the null hypothesis is true. If we repeated our data production many times, would we often get data as inconsistent with H0 as the data we actually have? Data that would rarely occur if H0 were true provide evidence against H0.
The P-value of a test is the probability, computed supposing H0 to be true, that the test statistic will take a value at least as extreme as that actually observed. Small P-values indicate strong evidence against H0. To calculate a P-value we must know the sampling distribution of the test statistic when H0 is true.
A specific confidence interval or test is correct only under specific conditions. The most important conditions concern the method used to produce the data. Other factors such as the shape of the population distribution may also be important.
Whenever you use statistical inference, you are acting as if your data are a random sample or come from a randomized comparative experiment.
Always do data analysis before inference to detect outliers or other problems that would make inference untrustworthy.
Other things being equal, the margin of error of a confidence interval gets smaller as
− the confidence level C decreases,
− the population standard deviation σ decreases,
− the sample size n increases.
The margin of error in a confidence interval accounts for only the chance variation due to random sampling. In practice, errors due to nonresponse or undercoverage are often more serious.
There is no universal rule for how small a P-value in a test of significance is convincing evidence against the null hypothesis. Beware of placing too much weight on traditional significance levels such as α = 0.05.
Very small effects can be highly significant (small P) when a test is based on a large sample. A statistically significant effect need not be practically important. Plot the data to display the effect you are seeking, and use confidence intervals to estimate the actual values of parameters.
On the other hand, lack of significance does not imply that H0 is true. Even a large effect can fail to be significant when a test is based on a small sample.
If many tests are run, you will probably produce some significant results by chance alone, even if all the null hypotheses are true.
The z confidence interval for a Normal mean has specified margin of error when the sample size is. Here is the critical value for the desired level of confidence. Always round up when you use this formula.
Student study time. A class survey in a large class for first-year college students asked, “About how many minutes do you study on a typical weeknight?” The mean response of the 269 students was = 137 minutes. Suppose that we know that the study time follows a Normal distribution with standard deviation σ = 65 minutes in the population of all first-year students at this university.
(a) Use the survey result to give a 99% confidence interval for the mean study time of all first-year students.
(b) What condition not yet mentioned must be met for your confidence interval to be valid?
Student study times. Exercise 14.34 describes a class survey in which students claimed to study an average of = 137 minutes on a typical weeknight. Regard these students as an SRS from the population of all first-year students at this university. Does the study give good evidence that students claim to study more than 2 hours per night on the average?
(a) State null and alternative hypotheses in terms of the mean study time in minutes for the population.
(b) What is the value of the test statistic z?
(c) What is the P-value of the test? Can you conclude that students do claim to study more than two hours per weeknight on the average?
I want more muscle. Young men in North America and Europe (but not in Asia) tend to think they need more muscle to be attractive. One study presented 200 young American men with 100 images of men with various levels of muscle.7 Researchers measure level of muscle in kilograms per square meter (kg/m2) of fat-free body mass. Typical young men have about 20 kg/m2. Each subject chose two images, one that represented his own level of body muscle and one that he thought represented “what women prefer.” The mean gap between self-image and “what women prefer” was 2.35 kg/m2. 14.41 I want more muscle. If young men thought that their own level of muscle was about what women prefer, the mean “muscle gap” in the study described in Exercise 14.35 would be 0. We suspect (before seeing the data) that young men think women prefer more muscle than they themselves have.
Suppose that the “muscle gap” in the population of all young men has a Normal distribution with standard deviation 2.5 kg/m2.
(a) State null and alternative hypotheses for testing this suspicion.
(b) What is the value of the test statistic z?
(c) You can tell just from the value of z that the evidence in favor of the alternative is very strong (that is, the P-value is very small). Explain why this is true.
ANSWERS (a)H0: µ = 0; Ha: µ < 0. (b)z13.29. (c) This is far outside the range we would expect from a N(0, 1) distribution (more than 3 or 5 standard deviations away from the mean). Pulling wood apart. How heavy a load (pounds) is needed to pull apart pieces of Douglas fir 4 inches long and 1.5 inches square? Here are data from students doing a laboratory exercise:
We are willing to regard the wood pieces prepared for the lab session as an SRS of all similar pieces of Douglas fir. Engineers also commonly assume that characteristics of materials vary Normally. Make a graph to show the shape of the distribution for these data. Does the Normality condition appear safe? Suppose that the strength of pieces of wood like these follows a Normal distribution with standard deviation 3000 pounds.
(b) Give a 90% confidence interval for the mean load required to pull the wood apart.
Pulling wood apart. Exercise 14.50 gives data on the pounds of load needed to pull apart pieces of Douglas fir. The data are a random sample from a Normal distribution with standard deviation 3000 pounds.
(a) Is there significant evidence at the α = 0.10 level against the hypothesis that the mean is 32,000 pounds for the two-sided alternative?
(b) Is there significant evidence at the α = 0.10 level against the hypothesis that the mean is 31,500 pounds for the two-sided alternative?
Pulling wood apart. You want to estimate the mean load needed to pull apart the pieces of wood in Exercise 14.50 (page 389) to within ±1000 pounds with 95% confidence. How large a sample is needed? Tests from confidence intervals. A confidence interval for the population mean µ tells us which values of µ are plausible (those inside the interval) and which values are not plausible (those outside the interval) at the chosen level of confidence. You can use this idea to carry out a test of any null hypothesis H0: µ = µ0 starting with a confidence interval: reject H0if µ0 is outside the interval and fail to reject if µ0 is inside the interval.
The alternative hypothesis is always two-sided, Ha: µ ≠ µ0, because the confidence interval extends in both directions from x. A 95% confidence interval leads to a test at the 5% significance level because the interval is wrong 5% of the time. In general, confidence level C leads to a test at significance level α = 1 − C.
(a) In Example 14.9, a medical director found mean blood pressure = 126.07 for an SRS of 72 executives. The standard deviation of the blood pressures of all executives is σ = 15. Give a 90% confidence interval for the mean blood pressure µ of all executives.
(b) The hypothesized value µ0 = 128 falls inside this confidence interval. Carry out the z test for H0: µ = 128 against the two-sided alternative. Show that the test is not significant at the 10% level.
(c) The hypothesized value µ0 = 129 falls outside this confidence interval. Carry out the z test for H0: µ = 129 against the two-sided alternative. Show that the test is significant at the 10% level.
A test goes wrong. Software can generate samples from (almost) exactly Normal distributions. Here is a random sample of size 5 from the Normal distribution with mean 10 and standard deviation 2:
These data match the conditions for a z test better than real data will: the population is very close to Normal and has known standard deviation σ = 2, and the population mean is μ = 10. Test the hypotheses
(b) We know that the null hypothesis does not hold, but the test failed to give strong evidence against H0. Explain why this is not surprising.
z1.70, P = 0.0891. (b) The sample size is small, so the test has low power.
Pulling wood apart. How heavy a load (pounds) is needed to pull apart pieces of Douglas fir 4 inches long and 1.5 inches square? Here are data from students doing a laboratory exercise:
(a) We are willing to regard the wood pieces prepared for the lab session as an SRS of all similar pieces of Douglas fir. Engineers also commonly assume that characteristics of materials vary Normally. Make a graph to show the shape of the distribution for these data. Does the Normality condition appear safe? Suppose that the strength of pieces of wood like these follows a Normal distribution with standard deviation 3000 pounds.
(b) Give a 90% confidence interval for the mean load required to pull the wood apart. The data are a random sample from a Normal distribution, but suppose that the standard deviation was unknown, but we still want to find out if the mean strength is different from 32,000 pounds? What do you think we should do? Find a 90, 95, 99% CI for the average load of a Douglas fir in this example. Is the mean different from 32,000 pounds?
Draw an SRS of size n from a large population having unknown mean µ. A level C confidence interval for µ is
where t* is the critical value for the t(n − 1) density curve with area C between −t* and t*. This interval is exact when the population distribution is Normal and is approximately correct for large n in other cases. You can use the t-table or use the InvT function on your calculator (under distribution) to find the t critical values.
t-distribution with n Degrees of Freedom: T(n)
P (a ≤ X ≤ b)
tcdf (a, b, n)
CONDITIONS FOR INFERENCE ABOUT A MEAN
We can regard our data as a simple random sample (SRS) from the population. This condition is very important.
Observations from the population have a Normal distribution with mean µ and standard deviation σ. In practice, it is enough that the distribution be symmetric and single-peaked unless the sample is very small. Both μ and σ are unknown parameters.
USING THE t PROCEDURES
Except in the case of small samples, the condition that the data are an SRS from the population of interest is more important than the condition that the population distribution is Normal.
Sample size less than 15: Use t procedures if the data appear close to Normal (roughly symmetric, single peak, no outliers). If the data are clearly skewed or if outliers are present, do not use t.
Sample size at least 15: The t procedures can be used except in the presence of outliers or strong skewness.
Large samples: The t procedures can be used even for clearly skewed distributions when the sample is large, roughly n ≥ 40.