If we can calculate a mean score for one group of cases (such as the world's countries) on one variable, we can also 1) compare a group's mean scores on two or more different variables and, conversely, 2) compare the mean scores of two or more groups (e.g., countries in different regions of the world) on the same variable. The variables for which means are compared must, of course, be interval, though the groups that are compared can, and usually will, be nominal or ordinal.
In this topic, we will first examine a technique called the t-test, a measure of whether the difference between two mean scores is statistically significant. The test is also called "Student's t" because its inventor, William Gosset (1876-1937), wrote under the pseudonym "Student." One version of this test (called a paired-samples t-test) uses one group of cases and compares the group's scores on two different variables. Another version (the independent-samples t-test) compares the scores of two different groups on the same variable. Except that it is more powerful because it uses data at a higher level of measurement, the t-test is similar in concept to the chi-square test of statistical significance discussed in an earlier topic. T-tests are often employed in experimental research, with paired-samples t-tests commonly used to compare pre and post experiment scores, and independent-samples t-tests used compare two groups of subjects, such as a control group and an experimental group, on the same measure.
We will end the topic with a discussion of a simple form of a very powerful method called analysis of variance. Using one way analysis of variance, we will compare several groups in terms of the same variable. By partitioning the distributions of scores into between group variance and within group variance, we will be able to measure the strength of the differences between groups using a proportional reduction in error measure called eta2 (η2), and also determine whether the differences are statistically significant.
Paired-Samples t-Tests (comparing scores on two different variables for one group of cases)
How do people feel about the two major political parties? In the 2008 American National Election Study, respondents were asked to rate the Democratic and Republican parties on a "feeling thermometer," on which 100 represented the warmest, or most favorable of feelings, and 0 the coldest, or least favorable. When we run a t-test to compare the means of the two variables (the data are weighted using the “weight” variable), the result shows that the Democratic party comes out somewhat better. The first table below shows a mean score for the Democratic Party of 56.87, compared to 48.15 for the Republican Party. Is this difference sufficiently large that we can reject the null hypothesis that it is simply due to random sampling error (that is, chance)? The figure in the last column of the second table below helps us answer this question. The value of t with 2046 degrees of freedom (one less than the number of cases) has a significance level of .000, that is, it would occur by chance less than one time in a thousand. The difference is clearly statistically significant. Note: a two-tailed test of significance is used for "non-directional" hypotheses, in which we suspect that there will be a difference in scores, but don't know in advance of examining our results which score will be higher. Normally, hypotheses are "directional," and we have reason to predict not just that there will be a difference, but also which score will be the higher one. To calculate the one-tailed probability, simply divide the two-tailed result by two. For example, if a relationship were significant at the .04 level using a two-tailed test, it would be significant at the .02 level using a one-tailed test. (Of course, if the difference is not in the direction we predicted, then our hypothesis is not confirmed regardless of the level of significance.)
Independent-Samples t-Tests (comparing scores on one variable for two different groups of cases)
In 2000, Ralph Nader, running as the Green party candidate for president, won only about 97,000 votes in Florida (less than two percent of the total), but these votes almost certainly cost Democrat Al Gore Florida's 25 Electoral College votes and, with them, the election. By 2004, when Nader again ran for president, many Democrats had developed bitter feelings toward him. The 2004 American National Election Study asked respondents their feelings about a number of prominent politicians, including Nader. On the one hand, we might expect that most Democrats would be closer to Nader philosophically than would most Republicans, and so would have warmer feelings about him. On the other hand, there were the memories of the 2000 election. An independent-samples t-test will enable us to compare Nader's scores from respondents of both major parties.
Again weighting by the "weight" variable, we can see, first of all, that respondents of neither party had particularly warm feelings for Nader, with Republicans averaging 40.88 and Democrats 39.47 (see first table below). For the independent-samples t-test, there are two versions of the computational formula, depending on whether we assume that the variances of the two scores are equal. (The technical name for equal variance is homoscedasticity.) Before deciding which version to use, we need to determine whether there is a statistically significant difference (p<.05) between the variances of the two variables. The test for this uses an F ratio (a measure of statistical significance in the same family of measures as t and chi-square). In this case, we can see from the second table below that the F ratio is not statistically significant (p=.326). We can therefore proceed to use the version of t that assumes equal variances (though in this case, the results are almost identical regardless of which version we use). Because we could in advance have made a case either way as to whether Democrats or Republicans would have warmer feelings about Nader, we will use a two-tailed test. We find that the difference between Democrats and Republicans could have easily been due to chance (p=.419). The relationship is not statistically significant.
One-way Analysis of Variance and Eta2 (η2) (comparing scores on one variable for several different groups of cases)
The independent-samples t-test is a special case of a more general method that allows comparisons among more than two groups of cases. If we think of group membership as an independent variable, and the interval or ratio variable as a dependent variable, we might then ask whether the differences between the groups are statistically significant, and how strong an indicator group membership is of the value of the dependent variable. We can answer these questions with one-way "analysis of variance" (ANOVA) and a related proportional reduction in error measure of association called eta2 (η2).
We will illustrate these ideas by comparing the Gross Domestic Product per capita of countries in different regions of the world. Boxplots displaying this relationship are shown in the following figure:
Obviously, there are major wealth differences between regions. At the same time, there are important differences within some regions. European and North American countries, while the most affluent overall, vary considerably in their wealth. While most Asian countries are poor, there are a few outliers in this region that are at least as affluent as most in Europe and North America. But just how good a predictor is region of wealth? Put another way, how much of the variance in wealth is between region variance, and how much is within region variance?
For an interval or ratio variable, our best guess as to the score of an individual case, if we knew nothing else about that case, would be the mean. The variance gives us a measure of the error we make in guessing the mean, since the greater the variance, the less reliable a predictor the mean will be. For GDP per capita, we obtain the following parameters for all countries taken together:
How much less will our error be in guessing the value of the dependent variable (in this case, GDP per capita) if we know the value of the independent variable (region)? We can calculate the within-group variance in the same way that total variance is calculated, except that, instead of subtracting each score from the overall mean, we subtract it from the group mean (that is, the mean for the region in which the country is located). We can then determine how much less variance there is about the group means than about the overall mean.
provides us with the familiar proportional reduction in error. Eta2 thus belongs to the same “PRE” family of measures of association as Lambda, Gamma, Kendall's tau, and others.
Recall that variance is the sum of squared deviations from the mean divided by N (the number of cases). The “Sum of Squares” numbers in the ANOVA table refers to the sum of squared deviations from the mean. They are, in other words, the same as the between groups, within groups, and total variances, except that they have not been divided by N. Since N is the same for each, we can omit this last step. Eta2 is then calculated as follows:
In other words, by knowing the region in which a country is located, we can reduce the error we make in guessing its GDP/capita by 32.8 percent.
We can also perform a test for the statistical significance of this measure using the F ratio (assuming that we wish to calculate statistical significance for population data such as is in the "countries" file). In this case, the differences between the regions would occur by chance less than one time in a thousand. (See last column of the first table above). If there are only two groups being compared, the F ratio is mathematically equivalent to the t-test.
The following topic, regression (or ordinary least squares), is another, even more powerful way to analyze variance.
1. Start SPSS. Open the anes08s.sav file and the 2008 American National Election Study Subset codebook. Do paired-samples t-tests to compare the feeling thermometers toward "people on welfare" with "poor people." Weight cases by the weight variable.
2. Using the same dataset, and again weighting cases by the "weight" variable, do independent-samples t-tests to compare the feelings of Democrats and Republicans toward Joe Biden and Sarah Palin. Using boxplots, display these same relationships graphically.
3. Using the same dataset, and again weighting cases by the "weight" variable, do comparison of means tests, requesting ANOVA and eta2 to see how well party (use “partyid7”) and ideology explain respondents’ scores on the various “feeling thermometers” included in the file. Using boxplots, display these same relationships graphically.
4. Open the senate.sav file and the senate codebook. Do a comparison of means test between the voting records of Democrats and Republicans, requesting eta2. Repeat with senators’ gender and with the region of the state they represent as your independent variables. Which independent variable does the best job of explaining voting record? Using boxplots, display these same relationships graphically.
For Further Study
Fiddler, Linda, Laura Hecht, Edward E. Nelson, Elizabeth Ness Nelson, and James Ross, SPSS for Windows 19.0: A Basic Tutorial, N.Y.: McGraw-Hill, 2011): chapter 6; http://ssric.org/files/chapter6_v19.pdf. Accessed March 22, 2013.
Stockburger, David W., "One and Two Tailed T Tests," in Introductory Statistics: Concepts, Models, and Applications, revised February 19, 1998, http://www.psychstat.missouristate.edu/introbook/sbk25.htm. Accessed March 22, 2013.
Zhang, Ying (Joy), "Confidence Interval and the Student's T Test,"http://projectile.sv.cmu.edu/research/public/talks/t-test.htm. Accessed March 22, 2013.
 For the various formulas used to compute t-tests, see Ying (Joy) Zhang, "Confidence Interval and the Student's T Test," http://projectile.sv.cmu.edu/research/public/talks/t-test.htm#types. Note: what we have, using SPSS's terminology, called "paired-samples t-tests," Zhang calls "paired t-tests," and what we have called "independent-sample t-tests," he calls "unpaired t-tests." He also describes "one sample" t-tests, a subject not covered in POWERMUTT.