SPECIAL THEME: THE OTHER-RACE EFFECT AND CONTEMPORARY CRIMINAL JUSTICE: EYEWITNESS IDENTIFICATION AND JURY DECISION MAKING: Eyewitness Identification:
Thirty Years of Investigating the Own-Race Bias in Memory for Faces: A Meta-Analytic Review
Christian A. Meissner and John C. Brigham, Florida State University
Christian A. Meissner and John C. Brigham, Department of Psychology, Florida State University.
[*3] She based her identification on Smith's eyes, which she said were greenish-blue and upon his hands which she said were "light and slender" like the holdup man's. Mrs. McCormick testified that Smith's eyes were "different from most colored people . . . bright and piercing." Smith's defense attorneys then attempted to parry the state's first thrust in the trial. Mrs. McCormick was handed a picture of a man she couldn't identify. It was a picture of David Charles, with shorter hair, taken while he was in Vietnam. Assistant defense attorney Kitchen asked Mrs. McCormick if she had ever made the statement that all Black people look alike. "Yes, I made that statement," Mrs. McCormick said, "and they do to a certain extent, but there's a difference here" (Lickson, 1974, p. 66).
In 1971, five Black men, who became known as the "Quincy Five," were wrongfully indicted for the murder of Khomas Revels during a robbery in Tallahassee, Florida. Although no forensic evidence obtained from the crime scene was ever linked to the men, five White eyewitnesses positively identified them as among the perpetrators. In each of three trials the state argued, "What better evidence can there be than, 'I saw him,' from unprejudiced witnesses? This has been used since time immemorial. This is proof beyond a reasonable doubt. Five eyewitnesses!" (Lickson, 1974, p. 87). Despite the lack of physical evidence against these men, two of the defendants, Dave Roby Keaton and Johnny Frederick, were found guilty on the basis of eyewitness testimony and coerced confessions obtained by investigators. During the third trial involving David [*4] Charles Smith, hired investigators on the defense team located the three actual perpetrators of the robbery and murder, who became known as the "Jacksonville Three." The Jacksonville men were later brought to trial and convicted based on latent fingerprint evidence and identification of the automobile used in the murder. The Quincy Five were finally exonerated.
At the trial of David Charles Smith, social psychologist Dr. William Haythorn of Florida State University (a colleague of John C. Brigham) was called as an expert witness in rebuttal of the eyewitness misidentifications. Because the only evidence against the Quincy Five was in the form of cross-racial identifications, Haythorn and Brigham set out to locate empirical evidence on the often purported claim that "they [other-race persons] all look alike." However, at the time of this case (c. 1971) only a handful of studies had examined the phenomenon (Berger, 1969; Horowitz & Horowitz, 1938), and only one study had been published in the previous decade (Malpass & Kravitz, 1969). Due in part to this lack of scientific evidence on cross-racial identification, the court prohibited the expert testimony of Haythorn.
Today, three decades later, a plethora of researchers have studied the ownrace bias (ORB) in memory for human faces (also referred to as the cross-race effect or other-race effect). Although most now agree that the phenomenon is reliable across cultural and racial groups (Kassin, Ellsworth, & Smith, 1989), there is less consensus about the social and cognitive mechanisms that may govern the effect. Furthermore, little is known regarding variables that might moderate the effect, including those applicable to the eyewitness scenario, such as study time and retention interval. Thus, the goal of the current review and meta-analysis is not only to reconsider the reliability and generalizability of the ORB, but also to evaluate the validity of various theoretical mechanisms previously discussed in the literature and to propose a framework that might best account for the pattern of results across studies. Finally, we discuss the various practical implications of our findings for the legal and criminal justice systems.
Literature reviews of the ORB have noted the robustness of the phenomenon (Brigham & Malpass, 1985; Chance & Goldstein, 1996), and researchers have endorsed the importance and reliability of the effect in several surveys (Kassin et al., 1989; Yarmey & Jones, 1983). Furthermore, expert witnesses have cited the effect in cases involving disputed cross-race identification (Brigham, Wasserman, & Meissner, 1999; Leippe, 1995), and attorneys have acknowledged the importance of racial interactions in eyewitness identifications (Brigham, 1981; Brigham & WolfsKeil, 1983). Given the source of such endorsements, one might be quick to concede the robust and generalizable nature of the ORB effect. However, it is important to further investigate the particular levels at which reliability might be assessed. For example, (a) Is the effect generally replicable across studies? (b) Is the effect consistent across various racial/ethnic groups? (c) Is the effect significant across different types of memory tasks? and (d) Is the effect reliable across individuals and testing occasions?
Replicability Across Studies
The first issue has been explored in several previous meta-analytic reviews of the effect. Bothwell, Brigham, and Malpass (1989) found that roughly 80% of the samples they reviewed demonstrated a significant ORB effect. Overall, effect size estimates from several previous meta-analyses (Anthony, Copper, & Mullen, 1992; Bothwell et al., 1989; Shapiro & Penrod, 1986) have indicated a significant weak-to-moderate effect, accounting for 6% to 11% of the variability across studies. R. C. Lindsay and Wells (1983) also examined the reliability of the effect across 13 studies by way of a vote-counting procedure. Although they asserted that fewer than half of the studies (6 of 13) demonstrated a true ORB effect, their criterion requiring a complete crossover interaction of White and Black participants may have been overly stringent. As Chance and Goldstein (1996) later noted, a large majority of the studies (11 of 13) reviewed by R. C. Lindsay and Wells showed at least some evidence of the effect.
Consistency Across Racial/Ethnic Groups
Several of these reviews have also examined the consistency of the ORB effect across racial/ethnic groups. Whereas Bothwell et al. (1989) found relatively equivalent estimates for both White and Black individuals, Anthony et al. (1992) found that the ORB effect among White participants accounted for 2.5 times the variance than that among Black participants. These inconsistencies could be due to the analysis of slightly different groups of studies. Moreover, both reviews relied on moderately small samples, in meta-analytic terms (number of independent samples: ks = 28 and 44; number of participants: ns = 1,445 and 1,725, respectively), increasing the likelihood of significant fluctuations in moderator effects due to the influence of one or more studies.
Generalizability Across Memory Tasks
Most studies documenting the ORB effect have used a standard recognition paradigm in which participants are tested on their ability to discriminate between a subset of faces shown previously (targets) and a subset of novel faces (distractors). Although a handful of studies have utilized some variant of this basic task (Cross, Cross, & Daly, 1971; D. S. Lindsay, Jack, & Christian, 1991; Luce, 1974; Malpass, 1974), some reviewers, such as R. C. Lindsay and Wells (1983), have criticized the literature for not examining performance on other memory tasks, including more applied identification tasks. More recently, however, researchers have responded to this criticism by documenting the effect across a variety of paradigms, including matching tasks (Malpass, Erskine, & Vaughn, 1988) and lineup identification paradigms (Berger, 1969; Brigham, Maass, Snyder, & Spaulding, 1982; Doty, 1998; Fallshore & Schooler, 1995; Platz & Hosch, 1988). In addition, researchers have shown the presence of the effect across other measures of performance such as reaction time (Chance & Goldstein, 1987; Valentine, 1991) and other tasks of forensic relevance including facial reconstruction tasks (Ellis, Davies, & McMurran, 1979) and photo lineup construction by law enforcement officers (Brigham & Ready, 1985).
Reliability Across Individuals and Testing Occasions
A fourth level of reliability, namely consistency of the ORB effect across individual participants, has only recently been examined. In general, memory for human faces has been shown to demonstrate reliable properties when assessed by tools designed to investigate cognitive maturation and/or neurological impairment. For example, Malina, Bowers, Millis, and Uekert (1998) found that the Faces subtest of the Recognition Memory Test (Warrington, 1984) had sufficient internal consistency and reliability (Cronbach's [alpha] = .77) for clinical use (see also Soukop, Bimbela, & Schiess, 1999). Similarly, the Benton Facial Recognition Test (Benton, Hamsher, Varney, & Spreen, 1983), the Faces subtest of the Wechsler Memory Scale--III (1997), and the Face Recognition subtest of the Kaufman Assessment Battery for Children (Kamphaus, Beres, Kaufman, & Kaufman, 1996) have all produced sizeable reliability estimates (rs > .75). Interestingly, and pertinent to the current investigation, several laboratory efforts at demonstrating reliability in a face-recognition task have yielded only moderate reliability estimates (Chance & Goldstein, 1979; Goldstein & Chance, 1980; Malpass et al., 1998; Prospero, Corey, Malpass, Parada, & Schreiber, 1996). Although researchers had taken care to randomly assign faces to recognition sets in controlling for item effects (see Chance & Goldstein, 1979), more deliberate standardization in controlling the memorability of materials and test sets may provide for better estimates of facial memory reliability in future studies.
Although it has largely been assumed that the ORB effect would follow a similar pattern of reliability across testing occasions (namely, moderate-to-large reliability estimates), little research has been available to test this assumption. In a recent study, we (Slone, Brigham, & Meissner, 2000) sought to test the reliability of the ORB effect across an immediate and (2-day) delayed testing occasion. Our results indicated that although participants performed reliably on both own-race and other-race faces, rs (127) = .56 and .44, ps < .001, respectively, the magnitude of the difference between own-race and other-race performance (i.e., the ORB) was only somewhat reliable across the delay, r(127) = .21, p < .05). Malpass et al. (1998) also recently investigated the reliability of other-race face recognition across two separate testing occasions. Although they found no reliability in performance on other-race faces, r(11) = .08,
ns, this may have been due, in part, to the small sample of participants (n = 13). Their estimate of reliability for same-race recognition was significant, but of moderate size, r(59) = .36, p < .01. Although left unaddressed by the current meta-analysis, the issue of test-retest reliability in the ORB merits further investigation. Once again, greater care in the standardization of materials across race of face may provide more reasonable estimates of reliability in future studies.
The Search for Social-Cognitive Mechanisms
Thus far, theoretical notions for the ORB have spanned the realms of both social and cognitive mechanisms. Whereas early candidates included the effect of social attitudes and the notion of physiognomic differences between races, more recent hypotheses have involved the potential influence of interracial contact and the notion of a perceptual learning mechanism. Unfortunately, inconsistency has often plagued the literature seeking to verify each theory. Because previous [*7] reviews of the ORB effect have given much attention to the various theoretical positions (Brigham & Malpass, 1985; Chance & Goldstein, 1996; Shepherd, 1981), we provide only a cursory updated description of each approach.
One initial explanation for the ORB effect was that individuals with less prejudiced racial attitudes would be more motivated to differentiate other-race members, when compared with more prejudiced persons. Early research indicated that racial attitudes appeared to influence the degree of stereotypic likeness assigned to other-race members (Secord, Bevan, & Katz, 1956). In addition, early studies examining participants' performance on identification of race/ethnicity (e.g., Jewish vs. non-Jewish) demonstrated that more-prejudiced individuals often performed better than less-prejudiced individuals (Allport & Kramer, 1946; Lindzey & Rogolsky, 1950). However, other studies were not always supportive of the findings (Carter, 1948), and subsequent researchers noted that highprejudiced performance was likely influenced by a response bias to label more faces as out-group members (Elliott & Wittenberg, 1955).
Within the ORB literature, several early studies demonstrated a small relationship between attitudes toward other-race persons and recognition memory performance (Berger, 1969; Galper, 1973). However, when response bias was taken into account, Dowdle and Settler (cited in Yarmey, 1979) found that racial attitudes were unrelated to memory performance. Similarly, more recent studies have consistently failed to find a relationship between racial attitudes and memory for other-race faces (Brigham & Barkowitz, 1978; Lavrakas, Buri, & Mayzner, 1976; Platz & Hosch, 1988; Slone et al., 2000; Swope, 1994). However, racial attitudes are related to another factor thought relevant to recognition of other-race faces, namely, amount of interracial contact. A number of studies have found that those with more prejudiced attitudes report less contact with other-race members (Brigham, 1993; Brigham & Barkowitz, 1978; Brigham & Meissner, 2000; Brigham & Ready, 1985; Slone et al., 2000; Swope, 1994).
A second possibility for the ORB effect involves possible group differences in the inherent memorability of faces, such that faces of some races might show less physiognomic variability among group members when compared with other races. However, researchers examining this hypothesis have generally found little support for its validity. For example, Goldstein (1979) found no differences in physiognomic variability among Japanese, Black, and White faces. Additionally, several studies have demonstrated that latency and accuracy of same-different judgments do not differ across race of participant or race of face (Goldstein & Chance, 1976, 1978). Finally, within-race rated similarity has shown, at best, only an inconsistent relationship to perception by own-race and other-race individuals, leading Goldstein and Chance (1979) to conclude that, overall, there is little "compelling evidence for the homogeneity hypothesis" (p. 111). We should note that although physiognomic homogeneity may not be responsible for the ORB memory effect, a number of studies have indicated that different physiognomic facial features may be more appropriate for discriminating between faces of [*8] certain races (Ellis, Deregowski, & Shepherd, 1975; Shepherd, 1981; Shepherd & Deregowski, 1981).
A number of researchers have posited that the quality or quantity of interracial contact may play a vital role in the degree of ORB demonstrated by any particular individual. For example, researchers have proposed that increased contact with other-race individuals may increase memory performance by (a) reducing the likelihood of stereotypic responses and increasing the likelihood that individuals may look for more individuating information (Malpass, 1981; Shepherd, 1981), (b) influencing individuals' motivation to accurately recognize other-race persons through associated social rewards and punishments (Malpass, 1990), or (c) reducing the perceived complexity of unfamiliar other-race faces (Goldstein & Chance, 1971). Two major approaches to investigating contact are to examine groups of individuals differing in their degree of other-race contact or to assess individuals' self-reported contact with other-race persons.
With regard to the former approach, several early studies demonstrated that adolescents and children living in integrated neighborhoods better recognized novel other-race faces than did those living in segregated neighborhoods (Cross et al., 1971; Feinman & Entwisle, 1976). Other more recent studies have also shown evidence of the influence of contact in samples of White and Black individuals from Great Britain and Africa (Carroo, 1986; Chiroro & Valentine, 1995; Wright, Boyd, & Tredoux, 1999). Finally, a novel application of the contact hypothesis was recently conducted by Li, Dunning, and Malpass (1998) who demonstrated that White "basketball fans" were superior to White "basketball novices" in recognizing Black faces. Given that the majority of professional basketball players are Black, this effect was predicted on the basis of the fans' experience in differentiating individual players. It is interesting to note that not all studies have found the predicted relationship between high-contact and low-contact groups. Burgess (1997) found only a small effect of contact on the performance of Southern (Florida) and Northern (Maine) American samples of White individuals. Similarly, Ng and Lindsay (1994) found little support for the influence of contact on the performance of Canadian and Singapore samples.
In a number of other studies, researchers have assessed the relationship between memory for other-race faces and individuals' self-reported experience with other-race persons. Whereas early studies generally failed to find a significant relationship (Berger, 1969; Brigham & Barkowitz, 1978; Cross et al., 1971; Malpass & Kravitz, 1969), numerous studies over the past several decades have found at least some evidence of the relationship in both recognition tasks (Byatt & Rhodes, 1998; Carroo, 1986, 1987; Lavrakas et al., 1976; Li et al., 1998; D. S. Lindsay et al., 1991; Slone et al., 2000; Swope, 1994; Wright et al., 1999) and more applied lineup identification paradigms (Brigham et al., 1982; Platz & Hosch, 1988). This curious pattern of results over time will be further examined in the current meta-analysis. It is possible that the precision and validity of measures used to assess interracial contact have improved over the years. Alternatively, as Chance and Goldstein (1996) posited, a cohort effect may exist such that opportunities for interracial contact have increased following the desegregation [*9] and civil rights movements of the 1960s and 1970s, allowing for a greater range in the degree of interracial contact in recent years.
As reviewed in the previous section, a fair degree of empirical support exists for the notion that interracial contact has some influence on the magnitude of the ORB. However, researchers are still attempting to elucidate the specific cognitive mechanisms through which contact might actuate this influence, and to model their effects in more formal ways. The most popular general approach is likely that of
perceptual learning. As historically defined by Gibson (1969), perceptual learning involves "an increase in the ability to extract information from the environment, as a result of practice and experience with stimulation coming from it" (p. 3). Numerous reviews have been written concerning the various mechanisms likely to underlie the phenomenon (Ahissar & Hochstein, 1998; Proctor & Dutta, 1995; Walk, 1978), and most note the important role of Gibson's notion of differentiation, defined as focused attention directed toward invariant cues that provide the best bases for discriminations within a given stimulus set. More recent work by Haider and Frensch (1996, 1999) has furthered Gibson's notion by demonstrating that perceptual skill involves learning to distinguish between "task-relevant" and "task-redundant" information. Thus, increases in accuracy and speed of processing appear to reflect the extent to which individuals have knowledge of, and provide attention to, the appropriate (invariant) features of the stimulus.
Such an encoding-based effect has been documented in a variety of perceptual skill domains, including chess (Reingold, Charness, Pomplun, & Stampe, in press), bird watching (K. E. Johnson & Mervis, 1997, 1998), sports (Helsen & Pauwels, 1993; Shea & Paull, 1996), radiology (Christensen et al., 1981; Lesgold et al., 1988; Myles-Worsley, Johnston, & Simons, 1988), and even chicken sexing (Biederman & Shiffrar, 1987). It is possible that perceptual learning might also be responsible for the ORB phenomenon. For example, individuals may be able to discriminate own-race faces more accurately due to their use of appropriate (invariant) aspects of the face. On the other hand, cues used for own-race faces may not be appropriate when attempting to remember other-race faces, and thus performance would worsen when attempting to discriminate such unfamiliar stimuli. A handful of studies have investigated this notion of perceptual learning from a discrimination training perspective. Other research within this general framework has attempted to identify various aspects of the face that might be deemed "task-relevant" when recognizing own-race versus other-race faces and to provide evidence in support of more formal models of the ORB.
Discrimination training. Some researchers in the face memory domain have directly investigated the perceptual learning hypothesis by providing individuals with discrimination training on own-race and other-race faces. Although training seems to have no effect on improving own-race recognition (Malpass, 1981), there is some evidence that training may reduce the ORB, at least in the short run. For example, Malpass, Lavigueur, and Weldon (1973) attempted to improve recognition memory for own-race and other-race faces by either verbal or visual training tasks. Although verbal training showed no effect on recognition, a [*10] relatively short visual training task (1 hr) produced a significant reduction in the magnitude of the ORB. Lavrakas et al. (1976) also investigated the effects of training by presenting participants with a concept learning task. Post-training recognition performance demonstrated significant improvement on other-race faces for individuals in the concept learning conditions compared with the unchanged performance of individuals in a control condition. However, when participants in all conditions were tested again 1 week later, the performance of trained and untrained participants on other-race faces was no different. Finally, E. S. Elliott, Wills, and Goldstein (1973) investigated the influence of paired associate discrimination training in reducing the magnitude of the ORB. Whereas participants in the no-training and own-race training conditions displayed the typical ORB effect, those in the other-race training condition demonstrated significant improvement in recognition accuracy for other-race faces.
Configural-featural hypothesis. Although relatively short-lived effects of discrimination training have been found, other researchers have sought to identify the various cognitive processes that might differentiate own-race and other-race face recognition. One notable advance in the face memory literature has involved work on the face inversion effect, the finding that inverted (upside-down) photos of faces are identified more poorly than inverted photos of other objects. In early work on this effect, Yin (1969) concluded that face recognition was the product of a unique system, different from systems responsible for recognizing other kinds of visual stimuli. In contrast to this "neural specialization" hypothesis, Diamond and Carey (1986) proposed that perceptual learning might be operating in face recognition. In several experiments they showed that the inversion effect was not unique to faces, but rather occurred when participants had a great deal of experience with the stimulus materials. Inversion appeared to disrupt the effectiveness with which individuals were able to encode stimuli that were highly familiar to them. This, they claimed, stemmed from experienced participants' reliance on configural (or relational) properties of the stimulus. Novice participants, on the other hand, relied on only the featural (or isolated) aspects of the face that were less influenced by inversion. A number of subsequent studies have supported this general configural-featural hypothesis (see Farah, Wilson, Drain, & Tanaka, 1998).
The notion of expertise and configural processing has also been applied to the ORB effect. In particular, Rhodes, Brake, Taylor, and Tan (1989) proposed that greater experience with own-race faces would lead to a larger inversion effect, due to an increased reliance on configural information. The encoding of other-race faces, on the other hand, should not be as influenced by inversion due to the featural aspects that are relied on. As hypothesized, Rhodes et al. observed that own-race faces were significantly more susceptible to inversion than other-race faces for measures of both reaction time and accuracy. However, several other studies have observed either no interaction of inversion with the ORB (Buckhout & Regan, 1988) or larger inversion effects on other-race faces (Valentine & Bruce, 1986). Given the various methodological differences across studies, further empirical and theoretical work on the significance of inversion effects in the ORB would be valuable.
Finally, Fallshore and Schooler (1995) examined whether such perceptual expertise might also be involved in the
verbal overshadowing effect, the finding [*11] that generating a verbal description of a face significantly impairs subsequent identification accuracy (see Meissner & Brigham, in press, for a meta-analytic review). Specifically, they hypothesized that requesting participants to provide a description of a same-race face might cause significant declines in recognition performance by (a) forcing participants to rely on the featural (more verbalizable) aspects of the face and (b) disrupting the configural (less verbalizable) memory trace that was originally encoded. Performance in cross-race identification, however, was predicted not to show the overshadowing effect due to individuals' reliance on featural aspects when encoding other-race faces. Consistent with their hypotheses, Fallshore and Schooler found that although participants' recognition performance on same-race faces demonstrated the overshadowing effect (a 47% decrement in performance when verbal descriptions were given), other-race faces showed no such decline in performance.
"Face space" models. Although the configural-featural hypothesis has received much attention, other researchers have examined the particular manner in which faces might be represented in memory. Likely the most ambitious work involves that of Valentine and his colleagues (Valentine, 1991; Valentine & Bruce, 1986; Valentine & Endo, 1992) in the development of an exemplar-based model of facial memory. Although Valentine and colleagues conceded the notion of a configural-featural distinction in the type of facial features that individuals may encode, they disputed Diamond and Carey's (1986) proposal that a fundamental change in the underlying processing strategy occurs under inversion (Valentine, 1988; Valentine & Bruce, 1988). Rather, Valentine (1988) proposed that, in conjunction with the notion of schema theory pioneered by Goldstein and Chance (1980), an exemplar-model reflecting "the acquisition of knowledge of how faces vary" may account for the effects of inversion, race, and distinctiveness (Valentine, 1988, p. 485).
Generally speaking, Valentine's (1991) multi-dimensional space (MDS) framework holds that the representational system may be thought of as a hypothetical space in which faces are stored based on various dimensions representing features or sets of features. The model posits that these dimensions are based on an individual's prior experience with the stimulus set and thus are best suited for representation of own-race faces, due to a reliance on appropriate featural and/or configural information. As a result of this encoding, own-race faces are spread more evenly throughout the MDS and are better individuated from one another at retrieval. Conversely, other-race faces are poorly represented (and, thus, more tightly clustered in the MDS) due to the encoding of less appropriate featural and/or configural information. Valentine's (1991) model also posits, however, that with increasing experience, other-race faces may be better represented once the relevant (invariant) aspects of other-race faces are learned.
In a test of the MDS framework, Chiroro and Valentine (1995) examined the effects of race, typicality, and level of perceptual experience within the cross-race paradigm. Although the influence of rated distinctiveness on recognition of own-race faces had been widely known (Brigham, 1990; Hosie & Milne, 1995), the manner in which it might interact with race and perceptual experience had not been investigated. Based on the assumptions of the MDS model, Chiroro and Valentine predicted that only individuals who had considerable previous experience with other-race faces (high-contact) would demonstrate distinctiveness effects [*12] for both own-race and other-race faces. This was due largely to the notion that such individuals should be able to distinguish between typical and distinctive other-race faces based on features they had extracted through prior experience. In contrast, low-contact individuals were predicted to demonstrate no differences in performance on the distinctiveness dimensions of other-race faces. Overall, their results indicated the predicted four-way interaction such that distinctiveness effects for low-contact individuals were confined to own-race faces. On the other hand, high-contact individuals demonstrated significant effects of distinctiveness regardless of the race of the face.
Race-feature hypothesis. An alternative to Valentine's (1991) MDS model was proposed by Levin (1996) in explaining the paradoxical effect that individuals are slower at classifying the race of an own-race face compared with that of an other-race face. This other-race classification advantage (ORCA) was observed by Valentine and Endo (1992) and was explained as resulting from strong activation due to the high-density cluster of other-race faces in the representational system (MDS). Levin (1996) proposed an alternative to this explanation in which the ORCA was said to arise from a "facilitated classification process" (p. 1366). In particular, Levin suggested that other-race faces were more quickly classified due to an automated process in which race-specific coding is performed without regard for other individuating information, which is largely ignored.
In testing this race-feature hypothesis, Levin (1996) observed that participants demonstrating a large ORB in recognition memory also demonstrated a large ORCA when compared with other individuals (see also Levin & Lacruz, 1999). Given that the ORB observed was driven largely by false alarm responses to other-race faces, Levin argued that participants' coding of race alone was insufficient to discriminate between other-race faces, leading to a tendency to respond "seen before" during test. Levin further proposed that individuals having greater experience with other-race persons would be less likely to generate the racefeature response, but instead would initially seek out individuating information for later use. Although he did not test this possibility, Levin's observation is analogous to that of skill differences in the "basic level" categorization effect (K. E. Johnson & Mervis, 1997, 1998; Tanaka & Taylor, 1991). Namely, whereas novices respond to stimuli most quickly based on a basic level categorization (e.g., bird), experts respond just as quickly at the basic, subordinate (e.g., wren), and even sub-subordinate levels (e.g., Carolina wren). Thus, experts' conceptual knowledge of domain-relevant features appears to allow them faster access to multiple levels of identification. Similarly, individuals with more experience with other-race faces may have faster access to identity information by way of their conceptual knowledge of individuating features.
Taken together, a perceptual learning approach to understanding the ORB has considerable potential for explaining its cognitive origins. The focus on encodingbased processes within the configural-featural and race-feature hypotheses may stimulate future empirical and theoretical progress. In addition, the representational model put forth by Valentine and colleagues (Valentine, 1991; Valentine & Endo, 1992) has provided a testable framework within which both general and effect-specific approaches to memory for faces may interact. The current meta-analysis was designed to aid researchers in further exploring perceptual learning [*13] aspects of the ORB by providing aggregate estimates of the effect across several performance measures.
The present review of the ORB paradigm has yielded many testable hypotheses concerning both the general reliability of the effect and the various mechanisms posited for its occurrence. Our meta-analysis took the approach advocated by Hedges and Olkin (1985) in which a mean weighted effect size for the sample of studies was initially calculated, followed by prediction of effect size based on moderating variables (see B. T. Johnson, Mullen, & Salas, 1995, for a discussion of various approaches). In particular, we were interested in examining ORB effect size estimates for basic measures of hits (correctly identifying a face as "old") and false alarms (incorrectly identifying a face as "old"), as well as aggregate signal detection estimates of discrimination accuracy (the standardized distance between the means of the "new" and "old" distributions) and response criterion (the level of familiarity necessary for an individual to categorize a given stimulus as "old" vs. "new"; for a review of signal detection theory, see Green & Swets, 1966). Second, in testing the validity of several theoretical mechanisms posited in the literature, we also provide estimates of the influence of racial attitudes and self-rated interracial contact on other-race memory performance, as well as an estimate of the correlation between attitudes and contact as measured across studies. Finally, in addition to overall effect size analyses, eight moderating variables (described below) are examined across the four performance measures.
A total of 91 independent effect sizes described in 39 research articles were located, representing the responses of 4,996 participants. Of the 39 research articles, 6 (15%) were unpublished manuscripts or theses/dissertations. Studies were obtained using several methods, including (a) searches of
PsycINFO, Sociofile, and Dissertation Abstracts databases and using the key words "face memory," "face recognition," and "face identification" along with the key words "race" and "ethnicity"; (b) cross-referencing with the three previous meta-analyses (Anthony et al., 1992; Bothwell et al., 1989; Shapiro & Penrod, 1986) and various reviews on the effect (Brigham & Malpass, 1985; Chance & Goldstein, 1996; R. C. Lindsay & Wells, 1983); and (c) contact with colleagues in the field who may have had knowledge of fugitive literature that had neither been published nor presented at a conference.
To be included in the analysis, studies must have involved a within-subjects test of participants' memory for own-race and other-race faces. The statistical difference in performance on these two sets of stimuli for each participant is defined as the ORB. Note that, in contrast to several previous meta-analyses (Anthony et al., 1992; Bothwell et al., 1989), studies that involved only a single race of participants were included in addition to studies that involved races other than Whites and Blacks. Reasons for excluding studies involved (a) the lack of sufficient data from which to compute an effect size (Bruce, Beard, & Tedford, 1997; Caroo, 1988; Horowitz & Horowitz, 1938; Luce, 1974; Malpass, [*14] 1988), (b) the use of a between-subjects design and analysis (Caroo, 1986; E. S. Elliot, Wills, & Goldstein, 1973), or (c) the implementation of various methodological procedures that might obscure interpretation of the effect size estimate, such as unequal presentation rates for own-race and other-race faces (Byatt & Rhodes, 1998; Doty, 1998; Goldstein & Chance, 1985; Lavrakas et al., 1976; Padgett, 1997; Valentine & Bruce, 1986).
Based on the suggestions of Lipsey (1994), moderator variables were selected by way of three general categories of study descriptors. n1 First, we examined variables that were of substantive experimental and applied interest in characterizing the reliability and generality of the ORB effect, including the race of the participant and the type of memory task used. Fifty-six percent of the samples were reported as White, and 32% were reported as Black. The remaining 12% of samples included individuals of Arab/Turkish, Asian, and Hispanic origin. The majority (91%) of studies used a recognition paradigm, whereas 9% of studies used a (simultaneous and target-present) lineup identification task. Briefly, recognition paradigms involve presenting participants with a set of faces that they must later recognize from a group of "old" and "new" faces. Identification paradigms are generally more applicable to the eyewitness situation and involve presenting participants with a single face (either from a photograph or a short video) that they must later identify from a group (or photo lineup) of 6-8 similar faces.
Second, we assessed methodological or procedural aspects of studies such that we might identify possible sources of distortion involving boundary conditions under which the ORB might be observed. Such variables included (a) whether test stimuli were identical (72%) or different (28%) from those used at study, (b) whether races of face were presented and tested in a blocked (19%) or mixed (81%) fashion, (c) the amount of time participants were permitted to study individual faces (minimum = 0.12 s; maximum = 4 min; median = 3 s), and (d) the length of the retention interval between study and test phases (minimum = immediate; maximum = 3 weeks; median = 2 min).
Finally, we also considered other extrinsic study characteristics, including the date of publication or presentation and whether the effect size estimate was taken from a published or unpublished manuscript. Of the studies included for analysis, 27% were published in the 1970s, 33% in the 1980s, and 40% in the 1990s. Fifteen percent of these studies were unpublished and took the form of a conference presentation or a thesis/dissertation.
Measure of Effect Size
Our measure of effect size for the performance variables (i.e., hits, false alarms, and discrimination accuracy) was a single sample estimate equivalent to Hedge's g. This effect size was computed simply as the mean difference between own-race and other-race performance divided by the sample standard deviation, or
g = ([mu][own] - [mu][other])/S[D] (1)
To control for skewness in estimating the true population parameter,
g was transformed to g by way of Equation 2:
[*15] g = c(m) x g, (2)
c(m) = 1 - (3/[(4 x df) - 1]). (3)
To assess the influence of both attitudes and contact on the ORB, as well as the correlation between the two measures across studies,
r coefficients were recorded for each independent sample, after which r was transformed to Fisher's Z[r] by way of Equation 4:
Z[r] = .5 x log[e][(1 + r)/(1 - r)]. (4)
All formulae were obtained from Rosenthal (1994). Effect sizes demonstrating the ORB will be positive for measures of hits, discrimination accuracy, and response criterion, and negative for the measure of false alarms. Likewise, positive estimates for the racial attitude and interracial contact measures indicate that positive attitude toward and increased contact with other-race individuals leads to better performance on other-race faces.
Weighted Effect Size Analyses
To examine the pattern of effect sizes for each measure, estimates were weighted as a function of their independent sample sizes, after which the results were analyzed across studies. For each measure, the mean weighted effect size (g) is presented, in addition to a test of the significance of the estimate (Z), and the associated 95% confidence intervals.
Hits and false alarms. The mean weighted effect size for the proportion of hit responses across studies (k = 74) demonstrated a significant ORB, g = .24, Z = 15.43, p < .001, with 95% confidence intervals of .21 and .27. In practical terms, an odds-ratio analysis indicated that participants were 1.4 times more likely to correctly identify a previously viewed own-race face when compared with performance on other-race faces. For false alarm responses, the mean weighted effect size across studies (k = 53) also indicated a significant ORB, g = -.39, Z = 22.24, p < .001, with 95% confidence intervals of -.42 and -.35. Participants were 1.56 times more likely to falsely identify a novel other-race face when compared with performance on own-race faces.
Taken together, these results illustrate a "mirror effect" pattern in which other-race faces receive a lower proportion of hits and a higher proportion of false alarms when compared with own-race faces (Figure 1). The mirror effect has been termed a "regularity" of recognition memory and has been demonstrated for such variables as frequency, distinctiveness, and study time (see Glanzer & Adams, 1985, 1990). Although the theoretical mechanisms of this effect are often debated between models (Glanzer & Adams, 1990; Hintzman, 1988; Hirshman, 1995; McClelland & Chappell, 1998; Shiffrin & Steyvers, 1997), many studies have shown that the aggregate measure of discrimination accuracy is generally influenced when mirror effects are observed. Other researchers have noted changes in response criterion estimates as well; however, substantial differences in discrimination accuracy between stimuli must be present for the criterion effect to be observed (Hirshman, 1995; McClelland & Chappell, 1998). Hence, we were [*16] interested to see whether ORB differences would occur only on estimates of discrimination accuracy, or on estimates of response criterion as well.
Discrimination accuracy. The mean weighted effect size for the measures of discrimination accuracy across studies (k = 56) was g = .82, a significant ORB, Z = 42.32, p < .001, with 95% confidence intervals of .78 and .85. Overall, the ORB in discrimination accuracy accounted for 15% of the variability across studies, and participants were 2.23 times more likely to accurately discriminate an own-race face as new versus old when compared with performance on other-race faces.
Response criterion. Unfortunately, only six studies (k = 14) actually calculated a response criterion measure across participants. Of the 14 independent samples, 11 demonstrated a significant ORB effect ([alpha] = .05) such that other-race faces yielded a more liberal criterion when compared with performance on own-race faces. The remaining 3 samples demonstrated nonsignificant patterns. To further assess this effect, a studywise response criterion analysis was conducted in which the mean hit and false alarm rates for each study were used to calculate a response criterion estimate (see Macmillan & Creelman, 1990). The mean weighted effect size for the estimates of response criterion across studies (k = 49) was g = .30, a significant ORB, Z = 17.91, p < .001, with 95% confidence intervals of .26 and .33. Overall, this small effect of response criterion in the ORB accounted for only 1% of the variability across studies and indicated that own-race faces generally yielded a more conservative criterion when compared with performance on other-race faces.
In summary, the pattern of results for discrimination accuracy measures was consistent with the mirror effect pattern that was observed in the hit and false alarm responses. Given the significant size of the discrimination accuracy effect, the presence of a response criterion effect in the ORB was expected (Hirshman, 1995). A recent model of recognition memory proposed by McClelland and Chappell (1998) provided an account of this pattern of results by simulating the process of differentiation (Gibson, 1969). As discussed previously, differentiation has been implicated in the various perceptual learning approaches to explaining the ORB. In the Discussion section, we consider the merits of McClelland and Chappell's model and its theoretical implications for the ORB.
Racial attitudes. Researchers have long posited that attitudes toward other-race persons may be responsible for the ORB in face memory. However, as noted, empirical results have not generally supported this notion. To assess the validity of this hypothesis, we examined the pattern of correlations between racial attitudes and performance on other-race faces across studies (k = 14). The mean weighted effect size across studies indicated no significant relationship, Z[r] = -.01, Z = .25, with 95% confidence intervals of -.08 and .06. Hence, there appears to be no evidence of a direct influence of racial attitudes on the ORB.
Interracial contact. Researchers have also posited that interracial contact should influence the degree of ORB demonstrated by any given individual. To assess this relationship across studies, we examined the pattern of correlations between self-rated interracial contact and discrimination of other-race faces (k = 29). The mean weighted effect size across studies demonstrated a significant relationship, Z[r] = .13, Z = 5.34, p < .001, with 95% confidence intervals of .08 and .18. Overall, contact appears to play a small, yet reliable, mediating role in the ORB, accounting for approximately 2% of the variability across participants. This seemingly weak relationship between self-rated contact and the ORB may be due to limitations in the range of variability present in such measures. Future studies may wish to further explore alternative methods of assessing interracial contact.
Attitude-contact relationship. As noted previously, we have found evidence of a relationship between attitudes toward other-race persons and self-rated contact in our lab. It is conceivable that although individuals' attitudes have no direct influence on their memory for other-race faces, racial attitudes may yet play a mediating role by way of their relation to individuals' social experience with other-race persons. The mean weighted effect size between interracial attitudes and contact across studies (k = 10) demonstrated a significant relationship, Z[r] = .36, Z = 11.42, p < .001, with 95% confidence intervals of .30 and .42. In general, individuals with more positive attitudes toward other-race persons tend to rate themselves as experiencing more interracial contact when compared with individuals with more negative attitudes.
A test of the homogeneity of variances across the sample of weighted effect sizes (hit, false alarm, discrimination accuracy, and response criterion measures) indicated a significant degree of variability, exceeding that expected on the basis of sampling error alone, Qs > 1,000,
ps < .001. Thus, the design moderators discussed earlier were used to predict the variability across the sample of effect sizes. A weighted least-squares regression analysis (Hedges, 1994) was conducted for each measure across the three sets of moderator variables (i.e., reliability and generalizability, methodological characteristics, and extrinsic study factors). Effect sizes in the analysis were weighted as a function of their sample size. Due to the sensitivity of this fixed-effects analysis, we took a more conservative approach [*18] and discuss only those moderator effects with Z[j] >/= 3.30 or [alpha] = .001. n2 Significant effects resulting from this criterion yielded semipartial correlations (r[s]) ranging in magnitude from .11 to .33. Table 1 provides a summary of moderator effects (Z[j]) across the four performance measures.
Reliability and generalizability. The first set of moderators assessed whether the ORB was reliable across racial/ethnic groups and whether the effect was generalizable to the type of memory task. Similar to that of Anthony et al. (1992), results indicated that White participants demonstrated a significantly larger ORB when compared with Black participants with regard to the measure of discrimination accuracy, Z[j] = 6.91, p < .001. This effect appeared to stem largely from differences in the magnitude of false alarm responses, Z[j] = 9.50, p < .001. However, Whites and Blacks did not differ in the magnitude of the ORB on either proportion of hits or estimates of response criterion, Z[j]s = .79. White participants also demonstrated a significantly larger ORB when compared with participants grouped in the "other" racial/ethnic category. This effect was observed reliably in hit, false alarm, and response criterion estimates, Z[j]s >/= 8.14, ps < .001. However, the analysis of discrimination accuracy was not significant, Z[j] = 1.13. Mean weighted effect sizes for each racial/ethnic group across the four performance measures are displayed in Table 2.
Analysis of the effect sizes found in recognition versus lineup identification paradigms yielded no significant difference with regard to the measure of false alarm responses, Z[j] = 1.55. However, there was a tendency for studies using an identification paradigm (g = .45) to yield a larger ORB for proportion of hits when compared with studies using a recognition paradigm (g = .22), Z[j] = 2.76,
p < .01. Nevertheless, it is evident that the ORB effect is generalizable to both recognition and lineup identification tasks. As only a small proportion (9%) of the samples involved the use of an identification task, future studies utilizing the lineup paradigm would be valuable.
Methodological characteristics. The second set of moderators examined various methodological aspects that might influence the magnitude of effects observed across studies. First, studies were coded for whether they utilized the identical or different facial photographs at study and test and for whether the presentation of stimuli was mixed or blocked by race/ethnicity. Results indicated that the type of stimulus (i.e., identical vs. different) significantly influenced estimates of the ORB on the proportion of hits and estimates of response criterion, Z[j]s >/= 3.42, ps < .001. This effect of stimulus type was also apparent in the proportion of false alarms, Z[j] = 3.27, p < .01, though not at the [alpha] = .001 level. [*19]
Influence of Moderator Variables (Z[j]) Across Measures of Hits, False