We want to see whether high school GPA causes college GPA. We first run a univariate regression:
1 Can you interpret the b coefficient?
First we need to look at the units for colGPA and hsGPA. In this case they are both one GPA point. Thus, a one GPA point increase in high school GPA is associated with a 0.482 GPA point increase in college GPA. We then realized other factors could also be causing college GPA. We built the following multivariate model where
2 What happened to the coefficient of high school GPA? Why did it decrease?
Since now we are including two more variables into our model – ACT exam scores and average number of skipped classes per week – the coefficient for high school GPA has changed. Even though the effect of a change in high school GPA has decreased, it has decreased relatively little. Moreover, we must note that in this multivariate model we have created, high school GPA is still the most practically significant determinant of college GPA. 3 Interpret the b2 coefficient.
Once again, the first thing when interpreting b coefficient is look at the units of measure for both the x and the y. In this case, the unit of measure for ACT is one point in the ACT exam. Thus, a one unit increase on the ACT exam is associated with a 0.015 GPA points increase in college GPA.
4 How do we interpret the standard error of b1?
It is important to remember that the standard error of b1 is the standard deviation of the sampling distribution of b1. Since in Statistics we rarely work with 100% certainty, this is also the case. The standard error gives us a measure of variation in the possible values for b1 I could get given a set of assumptions. It allows us to construct a confidence interval for this value, just like we did with the sample mean. Think of both b1 and the sample mean as estimators of and respectively, and thus there is a margin of error associated with each estimator, they are not true parameters. 5 What t-score should we compare each of the t statistics for each b if we want to know if each of them is significant at the 95% confidence level?
In order to find the comparison t statistics for our t scores, we need to figure out our degrees of freedom. We know that since four variables are involved in our analysis (3 independent variables, one dependent), our degrees of freedom are n-4= 137. Since n>100, we can take t=infinite and thus t=1.96 at 95% confidence level. Thus, we should compare each of our t-scores to 1.96 to decide whether each b is significant at the 95% confidence level. 6 Which coefficients are statistically significant?
The coefficient of high school GPA and the coefficient for average number of classes skipped per week are both statistically significant. Note that the p-value for ACT scores is higher than 0.05, and we get a t (t=b/seb) of 1.39 which is smaller than 1.65, 1.96, and 2.58. Thus, the coefficient for ACT scores is not statistically significant. 7 How can the p-value tell us if a coefficient is statistically significant at 95% confidence level?
In order to reject the null hypothesis at the 95% confidence level, our p-value should be smaller than 0.05. Note that it is 0.05 and not 0.025. This is because we are always performing two tailed tests. 8 Can you think of other factors that could be causing college GPA?
For example: If the student went through some sort of depression or traumatic experience could possibly affect their grades. Moreover, whether they picked a course of study in college related to the subjects they had studied in high school could also be a factor. Whether the student had to take on a job to pay for college could also affect, the student could have less hours to study. Lastly, another factor that could be affecting college GPA in some cases would be whether the student changed from studying in their mother tongue in high school to studying in another language - which is not their mother tongue - in college.
9 Do you think you can make the case that this is a spurious correlation, partly spurious, or chain relationship?
(Remember two examples: the shoe size and literacy relationship in which changing x would have no impact on y because it is really something else that is affecting both x and y, and the funding and student test score relationship in which introducing teacher quality into the model wipes out the effect of funding).
Before going with our GPA example, consider the examples we saw in class. In the shoe size and literacy example we could not make a causal argument linking the two. In that example, we found that age was affecting both shoe size and literacy and was actually making the link between shoe size and literacy actually disappear. If we think about it, we can clearly see that as children get older, their reading ability improves dramatically. Changing an adolescent’s shoe size by one unit has no effect on their literacy, it is all about their age and brain development. (This is an example of a spurious relationship. Had a link between X and Y still remained even after proving that age is a variable affecting both would be an example of a partly spurious relationship). Another example we saw in class was that of a chain relationship. We used the example of the relationship between funding and student test scores. Researchers have found that if we account for teacher quality, the effect of funding on student test scores completely disappears. This is called chain relationship because funding determines teacher quality, which in turn determines student test scores. It is a chain.
Going back to our GPA example: even though we can think of other factors affecting college GPA, we can still make quite good argument that your high school GPA does have an impact on your college GPA. We have significant reason to believe this is not a spurious, partly spurious, or chain relationship. Nevertheless, our model is far from being perfect and other control variables should of course be added were we to publish these results anywhere.
The reasoning behind our causal argument is that high school GPA measures your understanding of a series of elementary subjects that you will need in order to understand and excel in your assignments at college level. If you have a proper command of essay writing, your foundation in math is strong, and have worked hard overall to delve into some difficult subjects during high school, it is understandable that you will be better prepared – and thus much more likely to maintain a higher GPA – when in college.
Now consider an alternative model in which we take our dependent variable to be high school GPA, and college GPA our explanatory variable.
10 Would this make sense even though colGPA coefficient is statistically significant? Why or why not?
Note that even though it would not make any sense to claim that college GPA causes high school GPA since you go to high school before you go to college, our results are still statistically significant. This is an important illustration of the fact that statistical significance is not tied to practical significance. Stata is simply a piece of software that performs a series of calculations that you tell it to do. The key to performing correct statistical analyses is to know the relationships between your variables well so that you can make a strong argument that there is an actual causation mechanism beyond the statistical significance reported in your regression output. Here, as I explained above, it is immediately clear which one is the direction of causation – which one is the Y and which one is the X. Nevertheless, there will be many times in which this will not be as clear, and social scientists are still debating the direction of causation of many variables. The idea you should focus on to make the case that you identified the right direction of causality is to try to figure out whether manipulating X directly affects Y. In this case, changing college GPA could hardly lead to a change in high school GPA because high school GPA has been set since before you arrived into college. In the shoe size and literacy example, it is hard to make an argument that a change in shoe size leads to higher literacy. The crime example is somewhat more complex. I could see that a reduction in poverty leads to a reduction in crime. Nevertheless, it could also be the case that creating a program oriented at helping street youth to reinsert back into society leads to a reduction in crime and subsequently to a reduction in poverty.