On reclassifying industries from the standard industrial classification system to the north american industry classification system



Download 104.67 Kb.
Date conversion18.04.2016
Size104.67 Kb.


ON RECLASSIFYING INDUSTRIES FROM THE STANDARD INDUSTRIAL CLASSIFICATION SYSTEM TO THE NORTH AMERICAN INDUSTRY CLASSIFICATION SYSTEM1
Shawn D. Klimek and David R. Merrell, Center for Economic Studies, U.S. Bureau of the Census

Shawn D. Klimek, U.S. Bureau of the Census, 4700 Silver Hill Road, Stop 6300, Washington, DC 20233-6300

sklimek@ces.census.gov
ABSTRACT
The North American Industry Classification System (NAICS) presents significant challenges to users of Census Bureau economic data by limiting the time series dimension of the data. We develop an algorithm to reclassify the 1992 Retail and Wholesale Economic Censuses on a NAICS basis. First, we use a SIC-NAICS concordance to assign establishments in Standard Industry Classification (SIC) industries that match uniquely to a NAICS industry in 1997. Second, using establishment identifiers, we link establishments in operation in both 1992 and 1997. If the five-digit SIC codes match in the two censuses, we apply the 1997 NAICS code. The remaining establishments are classified in SIC industries that match to multiple NAICS industries. We construct the proportion of 1997 establishments migrating from an SIC to each NAICS code. Using these proportions as weights, the algorithm draws from the uniform distribution and randomly assigns the remaining establishments in a 1992 SIC industry to a 1997 NAICS industry.

Key Words: NAICS, SIC, Random Assignments





  1. INTRODUCTION

The introduction of the North American Industrial Classification System (NAICS) was intended to classify industries more accurately by focusing on the processes of production rather than the products themselves. The idea is that the emergence of new technologies, new service industries, and new products posed significant challenges to the proper treatment of industrial classification. For a detailed history of the events surrounding the creation and introduction of the NAICS taxonomy, see North American Industry Classification System, United States 1997. While accurate and relevant industry coding is a laudable goal, the introduction of NAICS also poses significant challenges to users of economic data by limiting the data’s time series comparability, and hence there is a compelling interest to maintain the time series dimension of economic data. This paper describes efforts underway at the Census Bureau’s Center for Economic Studies to rectify the problem of comparing newly collected economic data (published under the NAICS system) to historical economic data (published under the SIC system).

This paper proceeds as follows. Section 2 details the algorithm developed to re-classify establishments included in the 1992 census from SIC industries to NAICS industries and presents some descriptive statistics that illustrate some of the salient features of the algorithm. Section 3 presents tables with preliminary aggregate tabulations for numbers of establishments, employment, and revenue for NAICS industries in 1992 and compares them to 1997. Section 4 discusses some refinements and extensions. Section 5 concludes, and Section 6 presents our list of referenced materials.

2. THE ALGORITHM


In this section, we describe the algorithm used to convert the 1992 Economic Census (EC) from the SIC basis to the NAICS basis. The algorithm was designed to create aggregate tabulations for the NAICS sectors 42, 44-45, and 72: wholesale, retail, and accommodation and food service, respectively. These sectors were chosen because the annual and monthly survey programs need to benchmark their data over a longer period than just 1997 and forward. However, the algorithm may be used more generally for non-manufacturing sectors.

The keys to implementing NAICS are the SIC bridge code. Consider the following example. The four-digit 1992 SIC 542100 was divided into three five-digit 1997 SIC bridge codes, 542110, 542120, and 542130. Each of these five-digit bridge codes then mapped to NAICS codes 44521010, 44522000, and 45439032, respectively. We use this concordance and the 1997 EC to construct the empirical distribution of 1992 SIC codes to 1997 NAICS codes. In this example, 74.6% of the SIC 542100 establishments are in NAICS 44521010, 23.1% in NAICS 44522000, and the remaining 2.3% in NAICS 45439032. We then integrate this distribution into the 1992 EC for the economic sectors mentioned above.


Table 1Distribution of 1992 Wholesale and Retail SIC industries to 1997 NAICS industries

Number of 1997 NAICS codes matching to each 1992 SIC

Number of occurrences

Percent


1

218

78.98

2

47

17.03

3

10

3.62

4

1

0.004

Total

276

n/a

As Table 1 shows, almost 80% of 1992 SIC codes (six-digit level) are matched to a single NAICS code (eight-digit level). The first step of the algorithm assigns each establishment in these SIC codes the NAICS code in the concordance. We regard this method of assignment as the best, since it relies only on the industry classification of the establishment in 1992 and on the one-to-one industry correspondences in the concordance. Simply put, it requires the least a priori structure from us.

The second step in the algorithm links the remainder of the 1992 EC and 1997 EC at the establishment level using the Permanent Plant Number (PPN). The PPN is assigned to each establishment’s physical location and remains unchanged, regardless of any changes in ownership or firm structure. Using the PPN, we identify establishments surviving from 1992 to 1997. If we observe the establishment in both 1992 and 1997 operating in the same five-digit SIC, then we assign the establishment in 1992 the same NAICS code it was assigned in 1997. We assume that consistent classification at the five-digit SIC level in both 1992 and 1997 implies that there are no changes in the type of industrial activity at the establishment level. Given this assumption, we consider this method a second best alternative to the one-to-one SIC-NAICS assignments described above.

The first two steps assign the majority of the 1992 establishments; however, a significant number of establishments still lack a NAICS code—even after step two. In the third (and final) step, we randomly assign the remaining establishments a NAICS code. When we merged the empirical distribution of SIC to NAICS codes, we included information about the proportion of establishments in the 1997 EC that are being classified from each SIC to a particular NAICS. For each establishment, the algorithm makes a uniform random draw and uses the proportions discussed above to weight each NAICS code with the appropriate mass of the distribution. Using this method we assign the remaining establishments a NAICS code. Table 2 below describes how the establishments in 1992 are assigned a NAICS code.


Table 2 Number and Percent of Establishments in 1992, by Method of Assignment


Method of Assignment

Number of Establishments in 1992

Percent of Establishments in 1992

One-to-One Matches

1,282,603

61.84%

PPN Matches

184,072

8.88%

Random Assignment

607,246

29.28%

Clearly, random assignment is the least reliable method of assigning a NAICS code to an establishment since it will provide different assignments each time the algorithm is executed. However, we think that this method nevertheless is reasonable; Table 3 shows why. Table 3 shows that most of the establishments in an SIC that require random assignment are cases where more than 90% of the establishments migrate to a single NAICS industry. From Table 2, we observe 58 SIC industries that do not match uniquely to a NAICS code—representing 607,247 establishments in those 58 SIC industries requiring random assignment. Recalling the example at the beginning of the section, SIC 542100 and all of its establishments would be included in the “80% to 70%” row of Table 3, since 74.6% of the establishments are in NAICS 44521010 rather than the other two. Table 3 is supportive of the random assignment method since over two-thirds of the establishments in SIC industries are cases where the lion’s share (over 90%) of establishments in 1997 are classified into a single NAICS industry. The main assumption underlying the random assignment portion of the algorithm is that of consistent industry composition. We assume that the correct probability of an establishment in 1992 is identical to the probability in 1997. There are at least two reasons why this assumption may not hold. First, rapid economic growth that differs across sectors and industries in the U.S. economy during the late 1990s could undercut this assumption. Second, the SIC industries most likely to be split up into several NAICS are those most likely to be experiencing rapid changes in industry structure. We feel that these two possibilities provide the basis for more work in providing even better aggregate tabulations for 1992 on a NAICS basis.


Table 3Distribution of NAICS Industry Largest Shares for SIC Industries and Establishments


NAICS Largest Share

Number of SIC industries

Number of establishments

90% or more

5

410,962

90% to 80%

3

40,497

80% to 70%

3

64,202

70% or less

47

91,585

Total

58

607,246

3. RESULTS AND DISCUSSION

In this section, we present data (at the U.S. aggregate level) from three sources. The 1997 tabulations are from Census publications. The 1992 tabulations are constructed using the algorithm described in section 2. The SIC totals are from the Census Bureau’s Advance Report.

We believe our numbers for the wholesale sector are quite reasonable. The growth from 1992 to 1997 in the number of wholesale establishments on an SIC basis was 4.59%. Comparing the numbers on a NAICS basis we compute 4.42% growth in the number of wholesale establishments. Second, employment on an SIC basis increases by 12.36%, where the numbers on a NAICS basis indicate employment increases by 11.08%.



The primary change implemented by NAICS was moving establishments “open to the general public” from the wholesale sector to the retail sector. This manifests clearly in the cross-sectional differences between SIC and NAICS in 1997. In 1997, there are 518,215 SIC wholesale establishments, but only 453,470 NAICS wholesale establishments. Differences between the establishments that remain in wholesale and those that moved to retail could explain the differences in the growth rates on an SIC and NAICS basis. Table 4 presents tabulations for the Wholesale Sector (NAICS 42) at the four-digit NAICS level.
Table 4Four-digit Wholesale NAICS Industry Tabulations for 1992 and 1997


NAICS Industry

Establishments in 1992

Establishments in 1997

Employees in 1992

Employees in 1997

Revenue in 1992

Revenue in 1997

4211

30,942

29,328

343,749

375,761

368,575,847

553,352,124

4212

13,835

15,246

135,065

157,462

53,420,802

75,003,478

4213

12,772

14,267

137,554

155,535

69,792,059

89,175,875

4214

43,282

45,351

652,917

716,113

255,980,672

357,383,550

4215

11,248

12,583

138,042

174,029

118,321,902

150,493,610

4216

33,224

38,234

367,428

475,766

208,920,514

357,691,888

4217

19,517

21,194

190,776

219,233

63,869,994

92,189,762

4218

72,991

76,643

681,232

772,550

228,364,711

328,968,331

4219

34,737

37,783

306,163

351,839

140,387,696

185,455,758

4221

16,139

15,848

220,439

214,35 0

99,508,473

117,062,485

4222

6,069

8,053

157,855

190,127

129,306,287

203,147,771

4223

18,776

20,707

188,228

207,574

103,957,220

124,104,420

4224

42,622

41,760

805,929

854,919

499,946,049

588,970,062

4225

11,551

10,343

108,710

97,251

136,869,416

166,786,245

4226

14,193

15,920

147,010

165,768

132,471,184

128,923,496

4227

14,181

11,297

151,030

137,829

274,197,575

267,623,942

4228

5,259

4,850

141,821

151,677

59,487,322

69,703,203

4229

32,949

34,063

344,244

378,531

161,919,348

213,618,778

NAICS Totals

434,287

453,470

5,218,192

5,796,557

3,105,297,071

4,059,657,778

SIC Totals

495,457

518,215

5,791,264

6,506,992

3,238,520,447

4,212,312,128

For the retail sector, we make similar comparisons regarding the growth in the number of establishments and employment. There are two interesting things to note. First, growth in the number of retail establishments on an SIC basis was 2.61%, while we calculate 0.8% growth when comparing under the NAICS industry basis. Second, employment on an SIC basis increased by 15.98%, but on a NAICS basis we compute employment increased by only 14.78%.



In the 1997 cross-section, the 1,118,447 establishments in retail NAICS is dramatically lower than the 1,566,049 establishments in retail SIC. Given the transfer of establishments from wholesale to retail, this seems surprising. The large decline in the number of establishments in retail is primarily due to the creation of Sector 72, Accommodation and Food Service. However, in addition to NAICS Sector 72, some establishments move to other new NAICS service sectors and even to manufacturing. Table 5 shows our industry aggregate tabulations for the Retail Sector (NAICS 44-45) at the four-digit NAICS level.
Table 5Four-digit Retail NAICS Industry Tabulations for 1992 and 1997


NAICS Industry

Establishments in 1992

Establishments in 1997

Employees in 1992

Employees in 1997

Revenue in 1992

Revenue in 1997

4411

43,052

49,237

922,932

1,138,995

349,832,714

553,652,292

4412

12,013

13,589

75,532

102,766

16,749,848

28,890,506

4413

57,231

59,807

413,518

477,200

54,532,471

62,824,978

4421

29,414

29,461

222,105

251,300

30,165,753

40,968,335

4422

32,667

35,264

187,244

231,545

22,278,040

30,722,478

4431

38,150

43,373

239,048

345,042

40,449,241

68,561,331

4441

73,190

71,916

767,805

952,296

138,659,107

195,888,196

4442

21,060

21,201

148,610

165,616

25,298,051

31,677,905

4451

107,404

96,542

2,525,407

2,643,608

332,215,630

368,250,471

4452

24,156

22,373

117,515

118,831

10,135,275

10,829,908

4453

31,386

29,613

132,989

130,635

20,319,081

22,684,120

4461

80,416

82,941

737,811

903,694

90,003,574

117,700,863

4471

128,369

126,889

817,263

922,062

154,043,396

198,165,786

4481

108,284

94,740

960,172

927,930

83,831,107

95,918,083

4482

37,206

31,399

184,415

185,803

17,883,367

20,543,252

4483

29,984

30,462

158,572

166,420

15,009,827

19,936,310

4511

46,929

46,315

319,956

362,973

31,456,522

41,415,227

4512

23,071

22,834

161,614

197,866

14,579,400

20,595,699

4521

10,346

10,366

1,585,742

1,795,577

168,370,441

220,108,157

4529

26,453

25,805

507,316

711,963

78,752,256

110,336,303

4531

27,341

26,200

122,114

125,195

5,719,237

6,555,088

4532

42,760

44,615

238,240

306,492

19,830,122

31573,035

4533

15,390

17,990

75,913

97,965

4,348,136

6,043,642

4539

32,460

41,033

142,677

223,334

16,630,362

33,937,396

4541

7,773

10,013

150,089

218,406

34,579,632

79,018,305

4542

6,391

7,070

69,628

66,348

6,330,079

6,884,497

4543

24,701

27,399

205,689

221,239

30,794,934

37,203,849

NAICS Totals

1,117,597

1,118,447

12,189,916

13,991,103

1,812,797,603

2,460,886,012

SIC Totals

1,526,215

1,566,049

18,407,453

21,349,109

1,894,880,209

2,562,093,519

The Accommodation and Food Service Sector (NAICS 72) is a new service sector created by NAICS; so, unlike the previous two sectors, we can make no comparisons of the SIC versus NAICS regimes. The sector is primarily composed of the two-digit retail major group SIC 58, Eating and Drinking Places, and a major group in services, SIC 70, Hotels, Rooming Houses, Camps, and Other Lodging Places. For the sector as a whole, we find that the number of establishments grew from 496,137 to 545,060, or 9.86%. The numbers were more dramatic for employment. Employment grew from 8,132,399 to 9,451,056, or 16.21%. Table 6 shows our industry aggregate tabulations for the Accommodation and Food Service sector at the four-digit NAICS level. Most industries show growth along all three dimensions with two exceptions. NAICS 7213, Rooming and Boarding Houses, shows small declines in both the number of establishments and employment. NAICS 7224, Drinking Places, shows a decline only in the number of establishments.



Table 6Four-digit Accommodation and Food Service NAICS Industry Tabulations for 1992 and 1997


NAICS Industry

Establishments in 1992

Establishments in 1997

Employees in 1992

Employees in 1997

Revenue in 1992

Revenue in 1997

7211

41,736

47,079

1,456,093

1,645,666

67,200,771

94,965,838

7212

6,520

7,598

33,069

35,331

2,090,253

2,734,918

7213

3,561

3,484

17,530

15,597

720,302

754,105

7221

170,030

191,245

2,983,807

3,641,402

85,011,492

112,450,172

7222

191,481

214,767

2,908,000

3,326,543

85,823,575

107,780,513

7223

26,961

28,062

429,854

464,870

16,203,609

19,407,810

7224

55,848

52,852

304,046

321,294

11,113,777

12,292,709

NAICS Totals

496,137

545,060

8,132,399

9,451,056

268,163,779

350,389,065

4. REFINEMENTS AND EXTENSIONS


In this section, we propose three refinements to our algorithm to increase accuracy and provide better measures of measurement error due to random assignment.

We currently use only plant-level information to assign NAICS codes to establishments. This can create a problem illustrated by the following hypothetical example. In 1992, a multi-unit firm with 100 establishments operates in SIC 541140, Grocery Stores. In 1997, the same firm operates 110 establishments in NAICS 44511020. Under our current method of PPN matching this is not a problem if the 100 establishments from 1992 continue to operate in 1997. However, suppose that the firm closes 20 establishments (i.e., the PPN doesn’t appear in the 1997 EC), and then opens 30 new establishments (i.e., 30 new PPNs appear in the 1997 EC). Under our current algorithm, we assign 80 establishments to 44511020, but we randomly assign NAICS codes to the 20 plants that close between 1992 and 1997. Currently, even when all of the establishments of the firm remain in the same five-digit SIC from 1992 to 1997 (and just one NAICS in 1997), it is possible that the algorithm assigns these exiting establishments to an inappropriate NAICS code in 1992—simply because they will fall into the class of establishments requiring random assignment. To correct this potential problem with the algorithm, we generate firm level data in 1992 and 1997. We restrict the sample only to firm records with more than one establishment appearing in only one SIC (five-digit) in 1992 and only one pair of SIC (five-digit)-NAICS (eight-digit) codes in 1997. Matching across the two years at the five-digit SIC level and keeping only extant firms will generate a dataset at the firm level with the appropriate firm level NAICS code. We add this step into the algorithm after the one-to-one and PPN matching steps, but before the final step of random assignment. The effect, we believe, will be to reduce the number of establishments that require random assignment.

We currently use only the percent of establishments in 1997 that move from a SIC to a NAICS code as the probability an establishment in 1992 moves to that same NAICS code. In order to use all of the information available, we plan to estimate the probability of a 1997 establishment moving from an SIC to a particular NAICS code—using traditional limited dependent variables techniques such as multinomial logistic regression. Using the parameter estimates from the model on 1997 data, we then generate a revised probability of being assigned to a particular NAICS code for each establishment based on its 1992 characteristics. This approach assumes that firm characteristics have the same effect on the probabilities across the two census years. We then make a random draw from the uniform distribution (weighted by the share of the probability mass estimated from the regression models) for each establishment and assign the appropriate NAICS code.

One additional weakness of the algorithm in section 2 is that each time the algorithm is executed an establishment can be assigned a different NAICS code depending on its random draw. This is true for all establishments assigned by the random assignment method. We propose implementing a bootstrapping method to simulate the true aggregate tabulations. To do this, we simply repeat the random assignment process (with the multinomial logit estimates mentioned above) a large number of times, each time generating the aggregate tabulations of interest. Our final estimate of the “true” tabulation is the mean of this distribution. In addition to the mean, we also will have estimates of the variance and other higher moments of the distribution. We expect this bootstrapping method to make very little improvement in assigning the number of establishments for NAICS industries in the 1992 EC. Given the assumption of the uniform distribution and identical weights on each NAICS code for each establishment across iterations, we believe that there should be low variance with respect to the number of establishments. However, depending on the heterogeneity of employment and revenue across establishments in 1992, the random assignment step could have potentially large effects on these aggregate tabulations. We expect our estimates of the variance and other higher moments to provide some insight on the severity of measurement error for these variables.

Current efforts focus entirely on the 1992-1997 reclassification in the wholesale, retail, and accommodation and food service sectors. We anticipate that future efforts will focus not only on the refinements discussed above but also on extending this work to other sectors such as manufacturing, services, and finance, insurance, and real estate. We think there is a compelling interest in extending the breadth of the NAICS conversion efforts to as many economic sectors as possible. Additionally, we anticipate that future efforts will aim toward maximizing the amount of historical data files that are converted. We feel that the algorithm developed to convert the 1992 Economic Census is sufficiently general to allow us to apply it to historical economic census data sets in many economic sectors.
5. CONCLUSION
In this paper, we present a methodology to reclassify industries from the Standard Industrial Classification system to the newly introduced North American Industry Classification System. This algorithm, designed solely for use in generating aggregate tabulations, uses three components to classify 1992 SIC codes to NAICS codes. First, cases were identified where SIC codes had unique correspondences to NAICS codes; in these cases, we merely assigned the unique correspondence backward to 1992. Second, for cases in which there are extant establishments between 1992 and 1997 and for which we observe those extant establishments producing in the same five-digit SIC in both 1997 as in 1992, then we assign the 1997 NAICS code to the 1992 establishment observation. The clear majority of industry assignments are made using these first two steps. Finally, for establishments that existed in 1992 but not in 1997 and for which there are no uniquely corresponding SIC to NAICS codes or establishments that switched industries, we assign a 1992 establishment observation a NAICS code based on random draw from a uniform probability distribution weighted by the proportion of 1997 establishments that migrated from the 1992 SIC to a given NAICS industry. Only about 29% of all establishments in NAICS 42, 44-45, and 72 are assigned to 1992 establishments observation using this method.

Recognizing that no reassignment algorithm will be perfect, we propose a number of refinements and extensions. These extensions range from modeling the probability that an establishment will migrate from one NAICS versus another based on establishment and firm observables (e.g. product lines and class of customer information) to generating standard errors associated with multiple iterations on the random assignment cases. Additionally, we think that we could refine our estimates by reducing the number of random assignment cases. This can be done by aggregating to the firm level (for establishments belonging to multi-unit firms and that are subject to random assignment) and imposing NAICS codes based on the firm’s NAICS code. Finally, we propose to extend our work to include more historical non-manufacturing Economic Census years and to extend our efforts toward reclassifying manufacturing data as well.


6. REFERENCES
Executive Office of the President (1998), North American Industry Classification System, United States 1997, Washington, D.C.: U.S. Office of Management and Budget.
Bureau of the Census (1999), “Accommodation and Food Services—Geographic Area Series,” 1997 Economic Census, Washington, D.C.: U.S. Department of Commerce.
Bureau of the Census (1999), “Advance Report,” 1997 Economic Census, Washington, D.C.: U.S. Department of Commerce.
Bureau of the Census (1999), “Retail Trade—Geographic Area Series,” 1997 Economic Census, Washington, D.C.: U.S. Department of Commerce.
Bureau of the Census (1999), “Wholesale Trade—Geographic Area Series,” 1997 Economic Census, Washington, D.C.: U.S. Department of Commerce.


1 Note: This paper reports the results of research and analysis undertaken by Census Bureau staff. It has undergone a more limited review by the Census Bureau than its official publications. This report is released to inform interested parties and to encourage discussion. Further, none of the tabulations for 1992 presented in this paper are official tabulations endorsed by the U.S. Bureau of the Census.



The database is protected by copyright ©essaydocs.org 2016
send message

    Main page