We are grateful for support from the University of Kentucky Poverty Research Center and from the National Institute of Child Health and Human Development. We have benefited from excellent research assistance from Ryan Brown, and from comments by seminar participants at UC-Berkeley, Stanford, University of Chicago, University of Kentucky, Chicago Federal Reserve, and IZA. All errors are our own.Abstract
The Great Migration—the early twentieth-century migration of millions of African Americans out of the South to locations with better social and economic opportunities—is understood to be a key element in the progress of African Americans in the U.S. This paper evaluates the role of the Great Migration on an essential dimension of lifetime wellbeing—longevity. Using data on precise place of birth, place of death, and age at death for individuals born in the Deep South in the early twentieth century, we seek to identify the impact of migration on mortality among older African Americans. To sort out causal effects we rely on the fact that proximity of birthplace to early twentieth century railroad lines had a powerful effect on migration, and thus serves as a viable instrument for migration. We find that there was positive selection into migration, in terms of human capital and physical health. Estimates indicate that migration reduced longevity, perhaps because of relatively poor health conditions in the urban areas to which this population migrated. 1. Introduction
Two inextricably linked phenomena lie at the heart of African American social history in the twentieth century: The first is the Great Migration—the movement of millions of African Americans from the South to the North, Midwest, and West. The second is a painstakingly slow, but extremely important, reduction in disparity between blacks and whites along many measures of personal well-being, e.g., in political enfranchisement, human capital accumulation, labor market success, and health.
A tremendous flow of intellectual energy has been devoted to the task of understanding economic forces surrounding the Great Migration—including efforts to ascertain the forces that led to the Great Migration in the first place, to evaluate the impact of migration on labor markets in places to which African Americans moved, and to assess how migration out of the South affected migrants’ economic outcomes.1 On this latter issue, prominent accounts—from Lemann’s (1991) influential The Promised Land: The Great Black Migration and How It Changed America through Wilkerson’s (2010) recent award-winning monograph, The Warmth of Other Suns: The Epic Story of America’s Great Migration—emphasize that the Great Migration was indeed a flight from poverty and oppression toward the promise of opportunity and freedom, but one whose promise was unmet for many migrating African Americans. Still, while there are extensive debates about many details, our reading suggests a general consensus, expressed in Smith and Welch (1989), that the Great Migration from low-wage to generally higher-wage labor markets contributed substantially to the significant black-white economic convergence in the mid twentieth century.2
To our knowledge, though, no previous research evaluates the causal impact of the Great Migration on health outcomes of migrating individuals. Our paper tackles that important question, focusing on mortality. Essentially we compare mortality rates, at older ages, for two groups of African Americans born in the “Deep South”—South Carolina, Georgia, Alabama, Mississippi, and Louisiana—during the early twentieth century: those who remained in the South and those who migrated to States outside of the South.3 To make sense of our exercise we draw on observations from four strands of literature:
A first literature focuses on health outcomes of African Americans in the twentieth century, especially the disparity in outcomes compared to whites. Measured in terms of life expectancy, racial disparity has decreased over the century, but remains high. According to recent life tables produced at the Division of Vital Statistics (Arias, 2010), the gapin life expectancy at birth between whites and blacks born in the U.S. declined from 10.4 years for cohorts born 1919-1921 (with life expectancies of 57.4 for whites and 47.0 for blacks) to 5.0 for the cohort born in 2006 (78.2 for whites and 73.2 for blacks)—a disparity that is at a historic low, but that, obviously, is still large.4
There are many proximate medical causes for the mortality gap, including black-white disadvantages in mortality due diseases of the heart, cancer, cerebrovascular disease, diabetes mellitus, and pneumonia and influenza (e.g., Levine, et al., 2001). Importantly, for our purposes, the incidence of life-threating disease (and other threats, such as violence) varies substantially across local areas in the U.S. For example, in a seminal paper, McCord and Feeman (1990) estimated the rate of survival beyond the age of 40 for black men in Harlem, circa 1960-1980, to be lower than for men in Bangladesh. Geronimus, Bound, and Colen (2011) provide more recent location-specific statistics, by race, for a geographically diverse set of locations, and similarly demonstrate high variation in mortality rates, and in black-white differences in mortality rates, across locations.
A second important literature examines the links between income, education, and mortality. As Preston (1975, 1980) famously demonstrated, mortality is much higher in the world’s poorest countries than in wealthier countries.5 Also, income and health are correlated within counties. For instance, Sorlie, et. al (1992) show that the life expectancy in 1980 of Americans in the bottom 5 percent of the income distribution was 25 percent lower than for those in the top 5 percent of the income distribution. Elo and Preston (1996), as another example, document an inverse relationship between income and mortality, and also between education and mortality. As for causality, Lleras-Muney’s (2005) work shows that increases in the education of whites in the U.S. due to expanded compulsory schooling laws in the early twentieth century resulted lower mortality. There is a possibility that post-school labor market success also directly improves health.6 In general, though, causal relationships between income, education, and mortality are very complex and difficult to sort out.7
The black-white gap in human capital and income is doubtless a major contributor to racial differences in health outcomes. This issue has been studied in many papers. For example, Sorlie, et al. (1992) find that increased income is associated with lower mortality rates generally, but blacks have higher mortality than whites at every level of income.8 Again, causal relationships are tough to unravel; we hope that our work on the Great Migration—one potential mechanism through which economic success and mortality might be linked—will shed some light on the matter.
The third literature focuses on the “long reach” of health threats in early childhood and in utero (Barker, 1990 and 1995), particularly conditions of nutritional deficiency during these crucial periods of human physical development. This idea plays an important role, for example, in Fogel’s (2004) analysis of the long-run decline in mortality. Importantly, for our study, even using a relatively small sample of 582 older African Americans, Preston, Hill, and Drevenstedt (1998) were able to show that “children who were exposed to the most unhealthy childhood environments were far less likely to reach age 85 than those living in more favorable environments.” In their study, mortality risks at young ages and mortality risks at older ages are shown to be positively correlated for this population, suggesting that assaults on health early in life adversely affect mortality at all subsequent ages for the population.9
The fourth, and final, strand of literature upon which we draw focuses on the migration decision itself. Migration is a form of investment; a possibly very high cost is paid by the migrant—direct migration costs, but also often a loss of present-day economic security and diminished contact with community and family—in the hope that life will be better elsewhere. Those who migrate plausibly have disproportionately high aspirations and motivation, and thus tend to invest more heavily in human capital generally (both schooling and investments in health). As Norman, Boyle, and Rees (2005) point out, for more than a century (at least since Farr, 1864), analysts have understood that migrants are a select group, and many papers examine this selection process. Halliday and Kimmitt (2008), for instance, show that among men younger than age 60 there is far lower geographic mobility at the bottom of the health distribution than among healthy men.10 Clearly, any effort to estimate the causal impact of the Great Migration on mortality must include careful consideration of issues related to selective migration.
Against this rather complex background, we focus on our narrow, but important, question: How did migration out of the Deep South affect older-age mortality of black men and women born in the early twentieth century?
As noted above, to our knowledge there has been no scholarly work on this topic. A key reason for this dearth of research, no doubt, is the lack of data. The key problem is that the primary sources of data for the study of mortality—Census records, vital statistics, and historic panel survey data—either provide far too little detail on place of birth (at best, typically, State of birth, and often only region or country of birth) or have sample sizes that are far too small to be definitive.
An innovative feature of our work is the use of administrative data from the Medicare Part B program, which accurately records date of birth (used for eligibility determination) and date of death (for the purpose of terminating benefits). While these data do not include place of birth, as a federal administrative dataset they contain Social Security numbers, and these in turn can be used to match to the Numerical Identification Files of the Social Security Administration, which have “town or county of birth.” With permission from the Center for Medicare and Medicaid Studies and the Social Security Administration (and with extensive confidentiality protection), these two files were matched. The resulting data cover almost the entire older population of the U.S.; our dataset includes more than 70 million observations. Because our data include place of birth and place of residence or death at old age, the data allow us to investigate the relationship between lifetime migration and mortality. However, the data are largely limited to individuals aged 65 and older, because 65 is the typical age of eligibility for Medicare, so for the most part our analysis is limited to mortality for men and women living to age 65. We study mortality also using coarser data from the vital statistics and the U.S. Census, which is useful for evaluating mortality at younger ages and for examining robustness more generally.
The remainder of our paper proceeds as follows:
In Section 2 we give a simple model in which young individuals treat both schooling and migration choices as investment decisions. The model presents the logic behind a pattern of selective migration—the possibility that those who migrate disproportionately have high levels of human capital, along observed dimensions such as years of schooling and unobserved dimensions such as initiative or ability. Our model shows that to identify the causal impact of migration out of the South on mortality, it is necessary to have an instrument that affects the likelihood of migration but is otherwise statistically independent of mortality. We suggest that proximity to early twentieth-century railway lines is helpful in this regard.
Sections 3 through 5 report the empirical exercise. In Section 3 we describe our data sources. In Section 4 we present our basic findings about migration patterns of African Americans born between 1916 and 1932 in the Deep South, with a focus on the ages at which migration occurs, and the labor market outcomes of migrants and non-migrants. We document that most migration out of South (which we generally call “migrating North”) occurs when individuals are in the early part of their prime earnings years (ages 18-40). Men who migrate North earn substantially more than those who remain in the South. Section 5 proceeds with our analysis of mortality. Our central finding is survival rates declined among older African Americans as a consequence of migrating North.
In the concluding section we discuss implications of our findings.
2. A Model of Human Capital Investment and Migration
To fix basic ideas and set the stage for empirical analysis to come, in this section we provide a simple model. At the core of our model is the assumption that an individual’s lifetime indirect utility, U, depends on human capital, H, which in turn is a function of an endowed latent ability, α, and the level of schooling, E, chosen by the individual (or by parents), typically when the individual is young. We thus writeassuming to be strictly concave, with and We also expect that
Let κ be the marginal cost of education, and w be the market return to human capital. Individuals then maximize indirect utility, which is simply
The necessary condition for optimization is
and the second order condition is
Our model predicts that individuals respond to market incentives in an intuitively sensible way; an increase in the anticipated wage induces higher investment in education,
In our model, individuals with relatively high levels of innate ability acquire more schooling than those with lower levels of ability,
With this basic model of human capital accumulation in mind, consider a decision to migrate, which, like schooling, constitutes a form of investment.11 In particular, consider an individual living in a Southern State who anticipates earning a higher return on human capital in the North: To obtain the higher return, he incurs a migration cost, Lifetime welfare maximization then requires that the individual compare the optimal outcome given a move North to the optimal outcome in the absence of migration, i.e., to compare
where educational attainment is chosen optimally in each case (which for any α givessince the return to human capital found in the North is higher than in the South).
Given this set up, it is easy to see the emergence of a pattern of “selective migration” for individuals born in the South. As the model is set up, the cost of migration is the same for all individuals, but the lifetime return to migration is higher for those who are endowed with relatively higher levels of innate ability.12 Let be the level of ability such that an individual is indifferent between migrating North and remaining in the Southi.e., the level that solves
Individuals with ability lower than remain in the South, while those with higher ability migrate to the North.
The selection pattern we describe potentially bedevils attempts to empirically evaluate the effect of migration on individual outcomes. Any observed differences in outcomes between migrants and those who remain in the South—increased education, higher income, better health, etc.—can be the consequence of improved conditions in the North relative to the South, but can also be due to systematic unobserved differences in traits of migrants and non-migrants.
We argue below that the key to making headway in the identification of causal effects depends on observing Southern-born individuals who face differing costs to migration. So, for future reference, we give an intuitively sensible comparative static here:
An increase in migration costs shifts upward the threshold the level ofnecessary to induce migration.
Within this framework, we are interested in estimating an effect that might reasonably be thought of as “the causal impact on longevity of migrating North” for African Americans. Toward that end, let indicate that the individual migrates North and indicate non-migration. Let Y be the outcome of interest; our focus in this paper is longevity, but one might be interested in other outcomes like income, education, or literacy. We designate to be the outcome if an individual migrates and to be the outcome if the individual does not migrate; of course we actually observe only
Now let be an indicator for birthplace in a town on a railroad line (1) or not on a railroad (0). We assume, crucially, that ╨ where, following Dawid (1979), the symbol ╨ indicates statistical independence. Proximity of birthplace to a railroad line is thereby assumed to have no impact on longevity, but it does affect the cost of migration, M. Letting X be relevant covariates, we assume that the probability of migration, is increasing in
for allwith strict inequality holding for some.13 To conserve notation, we ignore the covariate notation for the remainder of the discussion.
Arguments presented in the literature surveyed in the Introduction give us reason to believe that longevity (Y) is increasing in human capital (which is, of course, unobservable here).14 If so, our theory has crisp predictions:
Letbe the threshold level of innate ability necessary to induce migration North for individuals whose birthplace in on a railway line, and let be the comparable threshold for those whose birthplace is not on a railway line. Notice that there are some individuals who will migrate if they are born in a railway town, but not if they are born in a non-railway town. Now we can divide individuals into three groups:
First, designate all individuals whose latent ability level is belowto be in Set N, the set of “never movers.” Second, let those whose latent ability is greater than be assigned to Set A, the set of “always movers,” i.e., individuals who move North regardless of where they are born. Finally, we have Set C, the set of “compliers,” so named because conceptually they are people who “comply” with the instrument—moving North if born in a railway town and remaining in the South if born in a non-railway town. These, obviously, are individuals for whom
Our predictions, then, are
These predictions follows from the fact that longevity is monotonically increasing in for a given location.
We cannot observe these relationships in (11) directly in either location. We can, however, estimate some of the objects in the two sets of inequalities. The mean longevity for the non-migrating individuals belonging to Set N (the “never movers”) is found by evaluating non-migrants who were born in a railway town
Since no one in Set N migrates to the North we cannot estimate the counterfactual Conversely, the mean longevity of migrating individuals in Set A (the “always movers”) is found by evaluating migrants to the North who were born in towns not on the railway line:
And since no one in Set A remains in the South we never observe the counterfactual
For our purposes—finding the causal impact of migration on mortality—Set C is key. For these people we can estimate both and To see how, notice first that expected longevity among non-migrants collectively is simply
We can easily estimate each element of the right-hand side. is estimated by andis estimated by . The term is estimated byi.e., by the proportion not migrating among individuals born in non-railway towns. Similarly,is estimated byi.e., by the proportion not migrating among those born in railway towns. Finally, is simply minus
Similarly, longevity among migrants is
and we can rearrange to get the desired expression
where the termis estimated directly by and the termis estimated by
In sum, we can directly test the following subset from our predicted inequalities (11):
These latter inequalities should prevail if (a) migration is positively selected and (b) higher levels of human capital lead to improved longevity.
As for the causal impact of migration, our set-up leads in a natural way to a standard Wald estimator,
Using the relationships given above, asymptotically this estimator is equivalent to
This estimator identifies the impact of moving North for individuals in Set C, the “compliers.” These are individuals in the middle of the ability distribution—people whose characteristics lead them to migrate North if they are born on a railway line, but to remain in the South if they are born in a non-railway town.
In setting up our estimation, we have closely followed Imbens and Angrist (1994) and Angrist, Imbens, and Rubin (1996)—proposing to estimate a “local average treatment effect” (to use their expression), where the term “local” emphasizes the fact that the estimate pertains for a particular subset of the population, and term “treatment effect” refers to the impact of migration. Three useful points are clarified by our discussion. First, the estimated effect applies for the middle-ability group only; the impact of migration on longevity might differ for higher- or lower-ability individuals. Second, the estimated effect includes the impact of behavioral responses made in anticipation of migration. Thus, for example, individuals who plan to move North might acquire more education in anticipation of higher returns in the North, and if so, the “treatment effect” of migration includes this behavioral change as part of the causal pathway whereby health might improve due to migration. Third, if individuals are positively selected into migration the LATE estimate will of course be smaller than the corresponding OLS coefficient. In addition, we can evaluate the selection process directly using (18): For those who remain in the South, those who stay regardless of birthplace (Set N) will have lower longevity than those born in non-railway towns who would have migrated had they been born in railway towns (Set C). And among migrants North, those who migrate regardless of birthplace (Set A) will have greater longevity than those who migrate only because of birthplace was in railway towns (Set C, the “compliers”).
As we have noted, our ability to study the impact of the Great Migration on mortality hinges on access to a unique data source, the Duke SSA/Medicare Dataset. We also use additional data sources, described below.
3.1. The Duke SSA/Medicare Dataset
Our primary data source is the Duke SSA/Medicare Dataset. These data consist of the Master Beneficiary Records from the Supplementary Medical Insurance Program (Medicare Part B) merged by Social Security Number to records from the Numerical Identification Files (NUMIDENT) of the Social Security Administration (SSA). The data are complete for the period 1976-2001. There are over 70 million records in the data, covering a very high proportion of the population aged 65 years and older. Because enrollment requires proof of age, the age validity of the records is high compared with other data sources for the U.S. elderly population. In addition to race, sex and age, information includes entitlement status (primary versus auxiliary beneficiary), zip code of the place of residence, exact date of death, and, importantly, detailed place of birth information. Specifically, the data include either town and State of birth or town, county and State of birth for all U.S.-born respondents.
To our knowledge, this is the only data source that provides detailed place of birth and detailed place of current residence in a very large sample. The data are therefore ideal for answering these questions: Which “sending communities” in the South sent people to which “receiving communities” in the North. A further advantage of these data is that death and population counts are based on the same data source.15
Before the SSA/Medicare data could be used for our purposes, there was a technical hurdle to overcome concerning location of birth. The SSA provides a 12-character text field for the place of birth as well as a two-character abbreviation for the State of birth. The State of birth abbreviations follow the Postal Service abbreviations and pose only minor issues to convert to Census State FIPS codes. However, the research strategy outlined above requires that we establish birthplace at a detailed level, so that we can determine precise longitude and latitude coordinates, and then determine proximity to railway lines using appropriate historical records.
In order to establish the birthplace from the 12-character text field, we developed an algorithm that matches this object to place names recorded in the U.S. Geological Service’s Geographic Names Information System (GNIS). The GNIS is the master list of all place names in the U.S. both current and historic, and includes geographic features including the longitude and latitude of each place. Our algorithm essentially classifies places according to the strength of their match between the write-in place of birth on the SSA NUMIDENT file and the GNIS list. We were able to match places at very high rates, and, we believe, with modest error. An Appendix Table shows that our data seem to have quite high coverage rates (typically 0.80 or above) for the 1916-1932 cohorts, but coverage rates are much lower for earlier cohorts. Thus we restrict attention to only the 1916-1932 cohorts. Additional details about the process and match quality are in an unpublished appendix available from the authors.
3.2. Vital Statistics and Census Data
In our analysis below we also use the Detailed Mortality Files (DMF) of the U.S. Vital Statistics registry. These files contain all deaths in the U.S. and includes State of death and State of birth. Using these data we can calculate the number of deaths at each age by State of birth for African-Americans. In order to estimate age-specific death rates by State of birth, we need to form estimates of the number of African Americans alive in specific years by State and birth cohort. Data to form these estimates come by combining data from the DMF and data from the 5 percent Integrated Public Use Samples (IPUMS) of the Decennial U.S. Censuses. In addition, we make use the of IPUMS Decennial Census files for 1920-1990 to trace out the age of migration for the early twentieth-century cohorts.