Science and Technology of Forensic DNA Profiling: Current Use and Future Directions
DNA-based Human Identity Testing and its Forensic Applications
While DNA-based genetic analysis has a long-standing and important role in research, medical diagnostics, and in patient care, the central role of DNA profiling in forensic investigations has been emphasized in the aftermath of the September 11, 2001 attacks on the U.S. DNA-based genetic profiling has been a key tool for direct and indirect identification of hundreds of victims from these attacks on the U.S. Furthermore, typing of relatives of criminal terrorist suspects has been suggested as a method for indirect identification of recovered remains of suspected terrorist operatives whose identity is uncertain.
The utility of genetic profiling based on inherited DNA variation has come of age and has now gained almost universal acceptance as a forensic tool that is central to the proper and complete investigation of many civil and criminal matters. Such applications include use in civil matters involving claims of patient sample or nursery mix-ups, paternity testing, as well as probate and immigration disputes. DNA profiling of crime scene evidence and comparison to known samples from victims or suspects provides a powerful exculpatory or exclusionary tool that has proven fundamental in resolution of many investigations of felony crimes. Study of close relatives of missing persons and unknown soldiers provides an indirect method to assign identity to recovered remains that would otherwise be unidentifiable. Various types of data banks and databases of the DNA samples or DNA profiles are now in widespread use, allowing linkage of evidence to possible suspects, for victim identification, and family reunifications.
Application of current and future technologies offers exciting opportunities as well as some troubling dilemmas and challenges. Practitioners and society must consider the nature and extent to which technology and data sharing should be used in the interest of public safety. This chapter provides background technical information on current DNA-based laboratory methods used in human identity testing, an overview of computerized searching of forensic DNA databases and a view to their future use. Problematic areas relating to the science and applications are addressed.
History of Forensic Identity Testing
Human forensic identity testing can trace its modern origins to the late 19th century when several individuals in Argentina and in Europe, including British physician/geneticist Sir Francis Galton, recognized the utility of fingerprint ridge pattern analysis for forensic use (see Cole S, in this volume). Galton, a cousin of Charles Darwin, was an early Mendelist who, with others, recognized that digital fingerprints analysis could be used for identification (Galton, 1892; Cole, 1998). Galton’s own studies of the various fingerprint patterns on the volar surfaces of the distal phalanges revealed that even monozygotic twins have distinctive patterns. The use of such fingertip ridge analysis by the French and British police gained widespread adoption in Europe and Latin America in the late 19th century, and such methods rapidly spread to other police and investigative agencies around the world.
During the first decades of the twentieth century, laboratory methods were developed for discrimination of genetically determined variation in human red blood cell antigens. The central importance of these discoveries became clear with their use for serotyping those needing blood transfusions due to illness, during surgery, and especially during wartime or other armed conflicts. The human blood groups were quickly recognized as useful genetic markers for the study of human population genetics, and their forensic utility in paternity disputes, nursery mix-ups, and criminal investigations was quickly recognized.
For most of the past century blood-group antigen typing, and other laboratory techniques for detection of inherited variants in blood cell enzymes or serum proteins were widely used for paternity testing in both civil paternity and criminal paternity. While the discriminating power of blood group typing does not compare with that of DNA profiling, it was used throughout the world before the introduction of contemporary DNA-based genetic profiling. These blood-grouping techniques also were used for medical applications such as typing and cross matching for selection of suitable donors of blood and blood products, bone marrow, and whole-organ transplantation. During this same period, forensic applications of these techniques to tissue, blood, saliva, semen and other body fluids from both evidentiary and known samples obtained as dried stains or as liquid or solid samples at crime scenes have been introduced into courtrooms throughout the world (Saferstein, 2000). It was not until the mid-1970s that methods to detect individual genetic variation at the DNA level were introduced, rapidly changing medical research, diagnostics, and forensics.
Since 1975, rapid technological changes have allowed highly discriminating DNA profiling to be accomplished using trace samples found at crime scenes. These laboratory methods include techniques for DNA amplification, fragment separation and direct sequencing. In addition computerized storage and searching of DNA profiles from known individuals and from crime scene samples has permitted investigation of crimes across jurisdictional boundaries. These issues will be discussed more fully in the sections below.
Even as DNA has competed vigorously with digital fingerprinting for the role as the gold standard in identity testing, both methods of “fingerprinting” have come under fire in the courts. For example, challenges to certain aspects of fingerprint analysis have recently occurred. These challenges have asked whether the theory, methods and interpretation of these fingerprint patterns meet the requirements outlined in the Federal Rules of Evidence and those set forth under Daubert by the U.S. Supreme Court (see Daubert vs. Merrill Dow Pharmaceuticals, 113 S. Ct. 2786 (1993). In a recent ruling by the U.S. District for the Eastern District of Pennsylvania, Hon. J. Curtis Joyner denied defense motions to exclude fingerprint evidence and granted the government’s motion to exclude the defense witnesses. U.S. vs. Byron Mitchell, CR No. 96-00407; and Imwinkelried in this volume). On March 13 of this year, Senior U.S. District Justice Louis H. Pollack of the Eastern District of Pennsylvania reversed his previous decision in U.S. v. Plaza-Lleera, Acosta and Rodriguez that barred fingerprint experts from telling juries that two fingerprints are a match. His ruling outlined his reversal and his conclusion”… that the standards which control the opining of a competent fingerprint examiner are sufficiently widely agreed upon to satisfy Daubert’s requirements.” (U.S. v Plaza, Acosta, Rodriguez).
Genetic Variation at the DNA level and Its Utility as a Biological Identifier DNA contains a chemical code that governs production of proteins, enzymes and other nucleic acids. DNA is divided and packaged into chromosomes that reside in the nucleus of individual cells. During each cell division these chromosomes, and the DNA contained therein, replicate and divide assuring that the two daughter cells contain the same set of genes as the parent cell. During gametogenesis (formation of sperm and egg) half of the chromosomes (and DNA) are transmitted to each offspring, in accordance with the 19th century discoveries of Gregor Mendel. In addition to genes inherited from both parents and packaged in the chromosomes in the nucleus of all cells, there is additional genetic material inherited only from mothers or from fathers. For example, mitochondria are transmitted in the cytoplasm of a mother’s egg cell to all of her sons and daughters- they are not normally transmitted by fathers. Mitochondria are essential in cellular energy production and replicate and segregate during each cell division, assuring that daughter cells each contain multiple mitochondria. Multiple mitochondrial chromosomes (each consisting of a single DNA molecule) exist in each of the many extranuclear organelles known as mitochondria, which exist in many copies per cell. Similarly, genes located on the Y chromosome are transmitted from fathers only to their sons.
FIGURE 1. PEDIGREE SHOWING DIFFERENCE BETWEEN NUCLEAR AND MT DNA TRANSMISSION IN FAMILIES
The DNA base pair sequence is normally identical in all cells in an individual. The “genetic code,” common in all living organisms, allows the specific DNA sequence of the AT or GC pairs to encode the chemical message that will determine the specific chemical structure of gene products - either RNA molecules or proteins. Current estimates suggest that the 3 billion base pairs of DNA in humans encode approximately 30,000 genes that produce an even greater number of functional gene products. A host of additional molecules (e.g. spliceosomes, chaperones) are involved in the complex process of protein synthesis, and alternative splicing of the DNA transcripts increases dramatically the actual number of gene products encoded by a more limited number of genes. These gene products, in turn, govern most other chemical and physiological processes. Mutations and other genetic variants occur commonly in the functional genes, forming the basis for certain heritable diseases, or advantageous traits, depending upon the specific nature of the DNA alteration, the other repertoire of genes, and upon the environmental condition of the organism.
Variation in Non-Coding DNA sequences In addition to DNA sequences that encode functional gene products, a large proportion of many animal and plant genomes contain sequences that do not appear to encode functional genes and may have other functions. DNA sequence variation between individuals in these non-coding regions allows comparison of the similarities and differences among members of the same (or different) species. Depending upon where in the genome they occur, DNA sequence differences in the non-coding regions between individuals may have no functional importance or may also confer selective advantage or disadvantage to the individual organism.
Geneticists have documented that while over 98% of the human genome appears to be very similar in most individuals, considerable variation exists in the DNA of the non-coding regions (see Lander et al., 2001; Venter et al. 2001). This genetic variation, transmitted from parent to child, is well studied by those interested in heritable disease, and by forensic scientists and others concerned with use of genetic markers to aid in identity testing. Inherited variation in specific regions (genetic loci) is analysed to identify the specific variant DNA lengths or sequences (known as “alleles”) present in a particular DNA sample extracted from a patient or, in the forensic context, a known individual or item of evidence containing a biological fluid or tissue. Variant regions of forensic interest exist in the DNA comprising the autosomes (22 pairs in humans), the so-called sex chromosomes (a single XX pair in females, a single XY pair in males) and in the DNA of the cellular pool of mitochondrial chromosomes (mtDNA).
Contemporary Forensic DNA Profiling There are many uses for forensic profiling of inherited DNA variants in matters pertaining to law and the courts. In addition to the well-known use of these techniques in murder and sexual assault investigations, in which DNA is extracted from crime-scene evidence and the DNA profiles compared to those obtained from known individuals (i.e., victims or suspects), all of the methods and procedures described in this chapter can be utilized in identification of bodies from natural or accidental disasters (e.g., identification of plane-crash and other victims from the 9/11/01 attacks on the U.S.), identification of war-crime victims (e.g., Kosovo, Bosnia), reunification of family members separated by war, natural disaster, or political oppression (e.g. , Argentina), and identification of “unknown soldiers”. Other applications include animal forensics (e.g., in poaching and bird-smuggling cases and endangered-species identification), and the study of human origins via population genetics (Balazs et al., 1989; Helmuth et al., 1990; Comey and Budowle, 1991; Galbraith et al., 1991; Krane et al., 1992; Alford et al., 1994; Hammond et al., 1994; Kimpton et al., 1994; Hochmeister et al., 1995; Huang et al., 1995; Lygo et al., 1996; Sensabaugh and Kaye, 1997).
Scientists interested in genetic variation at the DNA level in animal and plant populations utilize a number of methods for study of DNA samples. First, chemical and physical extraction methods are capable of removing intact DNA molecules from various tissues, even those compromised by environmental or storage conditions (see Bieber, 1998; Lee, 2001). The chemical methods utilize detergents to remove other cellular or chemical components of the tissue or fluid sample while physical methods utilize special synthetic membranes to physically bind DNA molecules from individual samples. Second, the polymerase chain reaction (PCR) allows amplification of trace amounts of extracted DNA into a quantity sufficient for analysis. Third, automated methods for direct determination of the base-pair sequence of DNA are now available and a variety of laboratory methods allow detection and comparison of length differences in certain defined regions of the genome that are known to exhibit stable inherited variation. Computerized digital storage of the resulting DNA profile (i.e., the specific length or sequence of DNA identified at one or more loci) allows rapid comparison of sample profiles from evidence or from certain individuals whose profiles have been stored in a central database.
1st publicly recognized use of DNA profiling in forensics The first major publicly recognized forensic use of DNA-based genetic profiling involved study of DNA polymorphisms (i.e., variations) in crime scene evidence in England in the mid-1980s. Alec Jeffries, a professor at the University of Leicester, utilized multilocus DNA probes to study DNA extracted from crime-scene evidence samples (obtained at autopsy from two teenage rape/homicide victims) and compared the patterns produced to those from DNA obtained under court order from a confessor to one of these two crimes. Jeffries' laboratory findings appeared to exclude the (false) confessor as a source of the DNA in both of the evidence samples, and, at the same time, documented the apparent genetic similarity of the profiles from the two crime scenes, indicating a single source of both of the evidentiary seminal stains (i.e., a serial killer). The search for the perpetrator of these two sexual homicides involved a voluntary blood donation (or "blooding") of adult males within the region in which the murders occurred (Wambaugh, 1989). This process eventually identified the perpetrator of the murders.
This dramatic use of modern DNA-based methods for forensic testing led to replacement of the older serological blood typing methods in virtually all laboratories in developed countries by the late 1990s. It also drew attention to the possibility of typing large numbers of individuals for elimination as suspects, and raised the idea of computerized storage of the DNA profiles of large numbers of individuals for use in investigations of unsolved crimes. In the United States, the first apparent use of PCR-based forensic DNA typing was to confirm that two autopsy samples were derived from the same person (Pennsylvania v.Pestinikis).
Admissibility of and Challenges to Forensic DNA Evidence in the Courts While DNA-based laboratory methods have gained widespread acceptance in most areas of biological and medical research and in medical diagnostics, vigorous (and often successful) challenges to DNA-based identity testing results have been made in courtrooms across the United States and elsewhere (Coleman and Swenson, 1994; Billings, 1997). These challenges have been based on issues surrounding the collection, transport, and preservation of evidentiary samples, chain-of-custody documentation, and matters pertaining to State and Federal rules of evidence (see Imwienkelried in this volume; Billings, 1992; Wooley and Harmon, 1992; Aldhous, 1993; Thompson, 1993; Scheck, 1994; Miles et al., 1995; Bieber, miles, mcle). In many State courts, novel scientific testing must meet the so-called "Frye test" or "Frye Standard." This refers to a landmark 1923 case (Frye v. U.S.) in which the Washington D.C. Appeals Court ruled that scientific evidence and theory must meet a general acceptance standard in the scientific community before such evidence can be admissible in court. This concept of "general acceptance" has led to hundreds of decisions in the State and Federal courts with regard to DNA-based forensic testing, the PCR-based methods, and the foundations of human population genetics in the calculation of the combined match probabilities. Since the U.S. Supreme Court's 1993 ruling (Daubert v. Merrill Dow Pharmaceuticals), many states have adopted the so-called "Daubert standard," which allows the Courts more latitude, with the judge as the "gatekeeper" in deciding which new evidence may be helpful to the finder of fact (i.e., the jury). Issues of admissibility, qualifications of laboratory personnel, compliance with laboratory standards, statistical interpretation of results, and laboratory proficiency testing continue to be contentious issues in many court cases in which DNA-based identity-testing results are offered into evidence (see Imwinkelried, this volume). Indeed, as this chapter was been written, admissibility challenges were in progress in Hennepin County, MN over the matter of whether DNA testing should be admissible when generated using unpublished PCR primer sequences that are held as proprietary intellectual property by the manufacturer. The rigorous challenges to DNA evidence have in a meaningful way altered the landscape of admissibility of all types of forensic evidence and have increased the scrutiny placed on collection and transfer analysis and interpretation of all forensic evidence.
In spite of the so-called “DNA wars” fought in the courtrooms, it is important to underscore the role of DNA-based genetic-identity testing as an exculpatory tool in addition to its potential exclusionary value (see M. Berger in this volume). As many as 30% of all DNA-based paternity test results exclude the tested male as the biological father of the referent child. Similarly, DNA-based forensic testing frequently excludes suspects, defendants, and even persons already convicted and incarcerated as sources of important evidentiary blood, body-fluid, or tissue samples. Already, over 100 [now at 108] post-conviction exonerations have now occurred in the United States using DNA typing that was performed months or years after convictions have occurred. Several of these post-conviction exonerations have involved death-row inmates (see National Institute of Justice, 1996).
Current Methods for Forensic DNA Analysis Several laboratory methods are utilized for forensic DNA profiling. Current methods utilize the polymerase chain reaction to amplify trace amounts of DNA for later analysis using protocols to detect DNA sequence or length variation. The methods, techniques, and laboratory practices used in forensic DNA analysis are similar and in some cases identical to those used for research and for clinical diagnostic protocols. What is notable about their use in forensic identity testing is the application of the high discriminating power of the profiling systems for purposes of genetic exclusion and their central role in contemporary criminal investigations, inheritance, and immigration disputes. Also unique are the many implications in society because of the controversial role of DNA-based profiling in matters pertaining to DNA data banks that hold computerized records of DNA profiles of convicted felons, crime scene evidence, and military personnel (Herrin, 1993; Imwinkelreid , in this volume). The Polymerase Chain Reaction (PCR) Once intact DNA has been extracted from a known or evidentiary source, most current laboratory methods for forensic DNA analysis take advantage of a laboratory method known as the polymerase chain reaction (PCR). PCR is used for amplification of specific regions of the genome in the specific organism tested. PCR methods allow amplification (or copying) of trace amounts of evidentiary DNA isolated from any number of plant and animal species. The PCR process mimics the normal cellular processes used in the replication of DNA molecules in living cells and organisms. After 25-35 cycles of PCR-based DNA amplification, millions of copies of the original source DNA molecules have been created, yielding sufficient DNA to allow detection of length or sequence variation that existed in the original biological sample. Validated PCR-based methods include tests for genetic variation in either the DNA sequence or in the length of certain defined DNA segments. These inherited variations are known as sequence polymorphisms and length polymorphisms, respectively.
Indirect Detection of PCR-Amplified DNA Sequence Variation - The Reverse Dot-Blot Method An important laboratory procedure, developed in the late 1980s and used widely since the early 1990s, detects DNA sequence polymorphisms using an indirect method referred to as the “reverse dot-blot” system. In this system, DNA molecules are labeled with chemical tags during PCR amplification. These tagged PCR products are then hybridized to a membrane containing fixed probes specific for the different possible alleles at particular genetic loci. This method first was developed by scientists at the Cetus Corporation and then by Roche Molecular Systems (Saiki et al., 1989; Reynolds et al., 1991). With the commercially available kits (Applied Biosystems, CA), it is possible to amplify alleles at six distinct genetic locations (or loci) simultaneously in a single multiplex reaction. As little as 0.5 ng of DNA can be typed using this method (e.g., 40-50 cells from a single hair root, or a single drop of blood). The kits contain all the necessary reagents, including primers for PCR and the typing strips. Because PCR methodology accomplishes efficient amplifiation of even trace amounts of DNA, great care is needed to reduce the chance of introducing extraneous DNA from other sources into the process. A separate area is therefore typically reserved for the preamplification and post-amplification steps of the PCR process (see Bieber, 1998; Holland and Bieber, 2000).
Detection of Genetic Variation in Length of PCR-amplified DNA Products In the 1990s PCR-based systems were developed to detect DNA length polymorphisms in forensic samples. Throughout the human genome, there are DNA sequences that are repeated side-by-side (i.e., in tandem) at various locations (loci) throughout the chromosomal DNA. They are not located within coding regions of functional genes and, at a given locus, the specific number of these repeat units varies in the population, making these markers useful for forensic purposes. Some of these loci are termed Variable Number of Tandem Repeats (VNTRs) and smaller repeat units (2-7 Watson-Crick base-pair repeats) are termed Short Tandem Repeats (or STRs). STRs (and VNTRs) are stable and inherited as Mendelian traits. PCR primers can be designed to amplify STRs, such that the amplified PCR products can be separated and their size determined. At any given locus, any individual could have one or two STR alleles, depending upon whether they inherited the same or different sized repeat units from each parent. Some STR variation also occurs on the human Y chromosome and these markers can help distinguish male contributors to evidentiary stains.
In forensic applications, laboratories most commonly perform the multiplex PCR amplification of between six and 16 separate autosomal STR loci along with the X- and Y-linked amelogenin alleles. Commercial kits allow detection of inherited variation at a single genetic locus or at multiple loci simultaneously (so-called multiplexing). Because the STR PCR amplification products are relatively small (i.e., less than 400 bp in size), STR genetic typing has found widespread forensic use because degraded DNA samples often preclude use of previous methods requiring more intact DNA. Visual comparison between the allelic ladder and amplified samples of the same locus allows rapid and precise assignment of alleles. Results can also be recorded in a digitized format, allowing direct comparison with stored databases.
Detection and typing of variation in length of the PCR-amplified four and five base pair STRs and of the X- and Y-chromosome specific amelogenin alleles, is based on the migration of the PCR-amplified alleles in relation to a laboratory sizing standard used for comparison (known as an “allelic ladder”). PCR-amplified alleles can be resolved using several laboratory methods, all involving separation of the DNA fragments based on their size (i.e., the number of repeats plus flanking DNA) (see Budowle, 2001; Butler, 2001).
INSERT FIGURE HERE Additional DNA Variation only Inherited from Mothers or Fathers While the DNA profiling methods described above apply to both males and females, two additional categories of forensic DNA profiling have widespread application to forensic identity determination. These methods are distinctive in that they identify DNA variations inherited only from mothers (mtDNA) or only inherited from fathers to sons (Y-chromosomes).
Study of Mitochondrial DNA (mtDNA) Inherited from Mothers First, DNA known as mitochondrial DNA (mtDNA) is transmitted almost exclusively by mothers to their offspring in the cytoplasm of the egg cell. This mtDNA is different than and replicates independently of the nuclear DNA described previously. Ample mtDNA sequence variation exists that can be extremely useful in population studies and for forensic profiling. This variation is detected by direct sequencing of the DNA base pairs in variable regions of the mitochondrial chromosome (Holland, parsons, smith).
Because mtDNA is found in extranuclear cell components, certain tissues, including bone and hair, contain ample mtDNA but no nuclear DNA. This becomes very useful when such tissue is the only available tissue, or when nuclear DNA is compromised due to post-mortem decay or to other detrimental environmental effects on nuclear DNA. Thus, in some compromised biological crime scene or disaster scene evidence, mtDNA analysis is possible while traditional nuclear DNA analysis is not. Because mtDNA is maternally inherited, all maternal relatives will typically have the same mtDNA profiles. This is convenient for kinship analysis or for victim identification when a known comparison sample for a decedent is unavailable for study. While mtDNA profiling allows certain types of family reconstruction to be accomplished, mtDNA profiling (by direct sequencing of the mtDNA) would not distinguish full brothers (or sisters) with the same mother.
Genetic Variation at Y-Chromosome Loci Inherited by Sons from their Fathers In addition to polymorphic loci located on the autosomes (non-sex determining chromosomes), length polymorphisms at numerous Y-chromosome specific STR loci have been identified (see Butler, 2001).
Considerable DNA variants exist on the Y-chromosome, transmitted by males normally only to their sons. This variation, in the form of “Y-STR variants“ can be detected using the same PCR-based methods described above for variants on the non-sex chromosomes. The human Y-chromosome contains gene sequences critical to normal testis differentiation (and therefore to “maleness”), in addition to non-coding sequences that can be typed by STR analysis or by direct sequencing. Once many such Y-linked STR “loci” are typed, a so-called haplotype (or “haplogroup”) can be identified that would be similar in all male descendents of a particular man. Because little genetic recombination occurs between the X and Y chromosomes in males, the Y-specific DNA is transmitted basically intact from fathers to sons, generation after generation (N.B. This method of Y-chromosome STR analysis was used to define a possible link between President Thomas Jefferson and Sally Hemings. As Thomas Jefferson did not have a namesake son survive to reproduce, it was necessary to locate male-line descendants of Jefferson’s paternal uncle, Field Jefferson, and the sons of Thomas Jefferson’s sister, whom some consider as possible fathers of Sally Hemings’ children.)
Y-chromosome STR analysis has proven quite useful addition to the standard panel of autosomal loci used in forensic analysis. In particular, Y-chromosome profiling is useful in investigations of cases involving sexual assault in which a DNA mixture of a female victim and one or more male contributors is found. Additional Y-specific DNA sequence variation can be examined using SNP analysis (see below) or by direct DNA sequencing. Numerous typing systems are now available for determination of allelic variation at several loci along an individual Y chromosome (known as the “Y-STR haplotype”) in a given human DNA sample (see Butler, 2001; Sinha, 2000).
There will be a limited number of Y-STR (or Y-SNP) haplotypes and, therefore, such DNA profiling cannot distinguish all males from one another, most certainly not males who are directly related from a common direct paternal line. Nevertheless, Y-STR haplotyping and mtDNA sequencing, along with routine nuclear DNA profiling can be used alone or in combination, depending upon the particular circumstance. For example, in mass disaster investigations, Y-STR analysis can determine whether the unidentifiable deceased body (i.e., the source of a particular DNA sample) is excluded or included as the possible relative (e.g., son, father, brother) of a surviving or deceased male whose DNA reference source is available.
Statistical Interpretation of DNA Profiling Results The objective of the PCR-based forensic DNA profiling is to compare the alleles representing the genetic types present in the DNA extracted from the evidence to genotypes of DNA extracted from known blood or tissue. After careful laboratory analysis, usually one of the three following interpretations is made assuming DNA could be extracted from the evidence.
Inclusion: Known or reference standard sample alleles are present in the evidentiary or questioned sample.
Exclusion: Known or reference standard sample alleles are not present in the evidentiary or questioned sample.
Inconclusive: No conclusion can be made as to possible source of the DNA extracted from the questioned sample.
Depending upon the specific evidentiary sample, mixtures may be identified and the interpretation of the evidence may be complicated by presence of alleles either found or not seen in both known and questioned samples.
Once interpretation of a DNA inclusion is made, it becomes crucial to perform the appropriate statistical interpretation of the data. The “finder of fact” needs to know how common or how rare a particular DNA profile is. In practice, this calculation permits an estimate of whether the profile that matches the evidence and the victim/suspect could be frequently encountered (e.g., found in one of 10 individuals) or that it is so rare that for practical purposes it should be considered unique or individualizing.
In essence, the calculations use the so-called “product rule” to allow multiplication of the expected frequency of a DNA profile at each locus with one another, to produce an estimate known as the “point estimate of the combined match probability“. These statistical estimates are derived from knowledge of the frequency in various populations of the specific alleles found in the key evidentiary samples. Thus, a particular multi-locus DNA profile would be expected less frequently if there were rare alleles than if there were common alleles. Similarly, a DNA profile based on typing of 13 independent loci would be expected to be seen in an unrelated individual less frequently than would a 5 or 6-locus profile. Simply put, the less common the allele(s) identified, and the larger the number of loci typed, produce a “match” statistic that would predict a smaller chance of observing the particular profile in a randomly selected unrelated individual.
Thus, for example, if a sample was matched to an individual, based on two loci, where the alleles observed at each locus occurred with a frequency of one in ten individuals in the population, then the calculated random match probability would be “1 in a 100”. To change the scenario a little, if the calculation were based on 13 loci, where each allele occurred with a frequency of one in ten individuals, the calculated random match probability would be “1 in 10 trillion.” Actual estimates generated can range from “1 chance in 10” for a single locus profile to “1 chance in 700 quintillion or less” for a 16 locus profile with uncommon alleles.
These statistics provide an estimate of the expected frequency of a particular profile (seen in both an item of evidence and in a known source) in an unrelated randomly selected individual. It might be assumed that a very rare multi-locus DNA profile generally would likely be considered by jurors to be more incriminating than a common one. Because of the multiplication step using the product rule, the number of independent loci at which results are available dramatically changes the frequency estimates from the thousands to the billions, quadrillions, or even quintillions. These estimates vary according to differences in frequencies of alleles in different human populations, and can even differ within populations (this is known as “population substructure”).
The statistical calculations of DNA profile match probabilities can be highly contentious in the courtroom, as they typically estimate the frequency of multi-locus STR profiles as being so rare that, for example, “only 1 in 7 quadrillion randomly selected individuals would be expected to have a certain profile found in both the evidence and a known source“. This essentially becomes a source identity statement (i.e., “the DNA at the crime scene comes from John Smith”) and while it has been argued that such statements are unduly prejudicial, they are based on sound statistical estimation methods using databases of allele frequencies in well-defined human populations. In fact, so-called “John Doe warrants” have been issued for the “owner” of a certain specified multi-locus STR profile in instances in which a key evidentiary DNA profile is obtained but the unidentified suspect remains at-large. This strategy attempts to avoid statute of limitation problems in cases where the perpetrator was unseen or unidentified.
Which Methods to Use? Considerations affecting which of the various laboratory methods to use include the nature, size and condition of the samples, and the number that will be analyzed. The PM+DQA1 test, based on dot-blot techniques, is the least technically demanding to perform, both in terms of experience and equipment requirements. A reasonably high throughput is possible (50 samples can be amplified and hybridized in a day by a single technologist) and the equipment requirements are minimal. This factor, together with its relatively high power of discrimination (Sensabaugh and Blake, 1993) as a result of using 6 different loci, made the PM+DQA1 test particularly suitable for screening large number of samples. The PCR products are small, ranging from ~250 to 150 bp; hence this test will also perform well when the DNA in the sample is degraded, a very common occurrence in forensic samples.
However, the traditional dot-blot methods are not ideal for analysis of DNA mixtures and the STR-based systems are more useful. Detection of low copy number samples will increase the sensitivity, but not necessarily the specificity of the analysis. Typing more loci increases the power of discrimination, thereby increasing the opportunity to exclude the falsely accused donor of a forensic sample, while also making the estimate of the combined match probability more incriminating when there is a “match” at all typed loci to a known reference DNA sample.
Although the amount of DNA template required for typing the D1S80 locus and the STR/amelogenin loci is about the same as that needed for the PM+DQA1 test, the former two tests require a higher-quality DNA and thus may not always perform as robustly when the DNA template is degraded. As the demand for greater volume of testing increases, newer technology along with more robotics has become incorporated into the forensic laboratories. STR systems (e.g., array detection of single nucleotide polymorphisms (SNPs) and mass spectroscopic analysis) have already replaced the dot-blot systems in most laboratories in the U.S., Canada, and Europe.
Critical Scientific Parameters for Reliable Forensic DNA Analysis Two National Research Council Reports (1992, 1996), several FBI DNA Advisory Board (DAB) Guidelines (see FBI, 1998), and numerous independent validation studies (see Bing, Budowle) have detailed the general scientific practices necessary for sound forensic DNA analysis. These reports, along with the individual laboratory and analyst experience form the basis for current use and interpretation of forensic DNA evidence.
General Laboratory Chain of custody, quality control, quality assurance, and care in sample handling and transport are particularly relevant in forensic investigations, where attention to detail in sample collection and legal chain of custody of all samples must be maintained (Lee et al., 1993; TWGDAM, 1995, FBI DAB). While small samples and sample mixtures are occasionally present in human genetics research and in diagnostic clinical genetics, these two challenges are often the rule in DNA-based forensic testing. Also, mathematical approaches to interpretation of results are a key factor in the interpretation of results and in their acceptance in the courts (Steinberger et al., 1993; Weir, 1996).
As a general rule, no forensic sample is processed unless the technologies have been validated and performed according to the DNA Advisory Board Standards, using standard control reference material supplied by or traceable to NIST. Other laboratory considerations include maintenance and calibration of equipment, assay of the quantity of DNA amplified, use of methods to ascertain that DNA amplification has occurred, and use of appropriate controls (TWGDAM, 1995, DAB,’98) VALIDATION STUDIES AND POPULATION STUDIES Adherence to high standards and use of quality equipment and reagents is clearly important (CPHG, UNITS 8.2 & 9.2). Careful validation studies of the reagents and commercially available "kits" used in forensic genetic typing have been performed by a number of groups, indicating that the results of such typing, when performed correctly according to appropriate protocols, are reliable and trustworthy (Cotton et al., 1991; Budowle et al., 1992; Lander and Budowle, 1994; Budowle et al., 1995a,b). Specific genetic loci typed in forensic DNA typing must therefore behave in accordance with known biological rules (e.g., Mendel's laws, Hardy-Weinberg Equilibrium), and must be independent of one another in order to use the “product rule” for statistical estimation of expected profile frequencies (National Research Council 1992, 1996; also see CPHG, UNIT 1.4). The two National Research Council Reports make specific recommendations in regard to the laboratory and population-genetic aspects of forensic testing and data analysis. There is also an abundance of journal literature on these subjects (Lander, 1989; Budowle et al., 1991; Evett and Gill, 1991; Gill et al., 1991; Edwards et al., 1992; Sajantila et al., 1992; Lewontin, 1993; Richards et al., 1993; Weir and Evett, 1993; Neeser and Liechti-Gallati, 1995; Weir and Buckleton, 1995; Evett et al., 1996; Micka et al., 1996, other refs too,,,,DAB standards).
LABORATORY AND PERSONNEL ACCREDITATION Several professional agencies have been involved in accreditation and certification of laboratories and personnel involved in genetic testing and human identity testing. These include the American Board of Medical Genetics (ABMG) and the American Board of Pathology (ABP), which certify doctoral-level scientists and physicians in several areas including clinical molecular genetics and molecular genetic pathology (involving required training and exam questions on paternity testing and forensics). Other agencies include the American Association of Blood Banks (AABB), the American Board of Criminalistics (ABC), and the American Society of Crime Laboratory Directors (ASCLD).
LABORATORY ACCREDITATION AND QUALITY ASSURANCE Forensic laboratories in the U.S. followed guidelines promulgated by the Technical Working Group on DNA Analysis Methods (aka TWGDAM) in the early 1990s. In late 1995, in accordance with the Federal DNA Identification Act of 1994, the Director of the FBI appointed a DNA Advisory Board (DAB) and charged its members with promulgation of updated recommendations and policies on quality-assurance standards for forensic DNA testing laboratories. The DAB produced several new recommendations, approved by the FBI Director, that define minimum quality-assurance standards and place specific requirements on the forensic laboratories. These policies are mandatory for receipt of federal funding and participation in CODIS (FBI/DAB, 1998). Quality control and proficiency testing are also addressed, including minimum education and training requirements for laboratory personnel. The DAB dissolved in late 2000 in accordance with the Federal statute. Recommendations or changes dealing with quality improvement and other technical matters will, in the future, be made by the Scientific Working Group on DNA Analysis Methods (aka SWGDAM). Several proficiency-testing programs are available for forensic and paternity testing, including those administered by the College of American Pathologists (CAP). CAP offers regular surveys (external open proficiency tests) in many areas of medical diagnostics, including genetic testing, and in paternity testing and forensics.
CAVEATS IN THE INTERPRETATION OF FORENSIC DNA TESTING Laboratory forensic geneticists are in a unique position in the search for truth in cases involving genetic identity testing results. While attorneys for the prosecution and the defense are clearly advocates for both justice and for their clients ("the people" and “the defendant”, respectively), the scientist/DNA analyst called as an expert witness has the distinct and important role as an educator to the judge in the pretrial hearings and to the finders of fact (i.e., the jurors) in the jury trial. As an expert witness, the forensic scientist is asked to offer fact testimony as well as opinion testimony. This is a special role in the adversarial justice system and should, in this author’s opinion, be exercised in a cautious and conservative fashion. In reality, this does not always occur and several forensic scientists have been investigated for allegedly falsifying or fabricating data, results, and trial testimony (see NY Times, OK City Tribune).
Several examples underscore the importance of understanding the caveats in interpretation of DNA evidence and the role of the expert witness as an unbiased participant in forensic casework. For example, there are several cases involving DNA-typing evidence linking an individual suspect or victim who happens to be a monozygotic twin (i.e., an "identical twin") to a person who is the source of crime-scene evidence. Clearly, analysis of nuclear DNA polymorphisms cannot distinguish between monozygotic twins (or triplets). Thus, a so-called "DNA match," while inclusionary, would not necessarily be probative. Similarly, DNA patterns from close relatives (especially those from genetic isolates or those conceived by consanguineous parents) have a greater chance of matching than do those from randomly selected, unrelated individuals, and this is often important to consider in the statistical analyses (Budowle, 1995).
Conversely, a "non-match" (or apparent exclusion) could be misleading to the finders of fact (jurors). For example, DNA typing of blood samples obtained from suspects/victims who were bone-marrow recipients after the time the crime-scene evidence was collected will not match the patterns/alleles identified from DNA extracted from peripheral-blood lymphocytes obtained after the marrow transplant.
It is crucial to note that the finding of a particular DNA profile at a crime scene does not provide any information about when and how it was deposited. Conversely, not finding a particular DNA profile at a scene does not indicate that the “owner” of that profile was not there at the scene. While most geneticists recognize that DNA typing cannot determine when or how the evidence was deposited at a particular location, this is not as obvious to the judge, lawyers, members of the jury, and the interested public. Evidence collection and chain-of-custody protocols are crucial to help insure that the scientific data are both reliable and probative. Degraded samples and contamination or admixture with nonhuman DNA (e.g., contamination with bacterial, animal, or plant DNA) present special challenges to the forensic DNA analyst.
Compiling and Searching of DNA Databases In a practical sense, banking of DNA samples and DNA profiles existed before the interest in forensic DNA registries. These repositories of human tissue or DNA include the heel-stick blood spot cards obtained in the first days of life from all live born infants in the U.S. These blood spot cards are collected for genetic disease screening by the Departments of Health, to allow prompt identification and timely treatment of severe but treatable inherited metabolic and genetic disease (Reilly, 1997). Furthermore, hospital pathology departments around the world routinely archive paraffin-embedded tissues from surgical biopsies and from autopsy studies conducted for diagnostic and prognostic testing. These tissue blocks are often retrieved from storage for retrospective DNA-based interrogation after the DNA has been extracted from the deparaffinized tissue. Several states have offered parents the chance to prepare and keep blood spot cards on their children and other family members, storing a blood spot on filter paper, a lock of hair, etc. in a way allowing future DNA testing should the child be lost, runaway, or otherwise displaced. It is the practice of the entire U.S. military to obtain and store fingerstick blood samples on special paper for later use as military “dog tags”.
Searching Forensic DNA Databases All 50 U.S. states have either statutory legislation providing for obligatory DNA banking of blood or saliva samples from those convicted of certain felony crimes. Federal legislation is recently in force covering U.S. Federal territories, buildings, the U.S. military and the District of Columbia. Other countries, including Canada and Great Britain, have regional or national DNA data banks containing the profiles of offenders or of crime scene evidence.
Under the provisions of the enacted legislation, blood or other tissue samples are obtained for DNA extraction and multiple genetic loci are typed (the number of loci typed in such cases varies between countries – in the U.S. 13 STR loci are typed). Typing results (multi-locus DNA profiles) are stored in a computerized database for future comparison to DNA profiles from evidentiary samples from unsolved crimes (crime scene index samples). Similarly, profiles from unsolved crimes can be compared to those in the databank of known offenders (offender index). In the U.S., individual states search their data against those in a central national index at the Federal Bureau of Investigation (FBI). In the U.S., this whole system, known as CODIS (Combined DNA Index System), is designed to link offenders or unsolved cases to one another and thus can identify possible suspects in neighboring or distant jurisdictions (U.S. Department of Justice, 1996).
Since the inception of CODIS and the various state-operated DNA databases, hundreds of case-to-case or case-to-suspect "hits" (i.e., DNA matches) have been reported, with one state (Florida) now reporting several new "hits" each week. Given the well-known high degree of criminal recidivism, particularly in sexual-assault cases, DNA databases hold promise for identification of more perpetrators than would be possible without such coordinated efforts (McEwen and Reilly, 1994; Scheck, 1994; McEwen, 1995).
In consideration of the effectiveness of DNA database searching in the criminal justice system, it is important to consider that costs be measured not simply by the number of so-called matches or “hits”, but also in the many benefits from exonerations or DNA eliminations. Indeed the elimination of someone as a suspect based on DNA profiling results can save hundreds if not thousands of hours of wasted investigative time and removes uninvolved parties from unnecessary intrusion from law enforcement personnel.
Lower Stringency Searching and The Matter of Siblings
Genetically related siblings typically share one or both parents. Thus, in the case of full-siblings, it would be expected that sharing of DNA profiles would occur much more commonly than among unrelated persons (beyond that of sharing mtDNA and Y-chromosome profiles, as discussed above). This expectation is supported by data collected on sibling and non-sibling DNA profiles in the U.S. Comparing the DNA profiles of full siblings indicates the expected higher degree of allele sharing and locus identity compared to unrelated individuals (Bourke, Ladd, Bieber). Our data demonstrate that full siblings born to unrelated parents have identical STR profiles at an average of four loci, compared to identity at less than a single locus among unrelated individuals. Our data set included a sib pair with identity a nine of the CODIS loci. In a quality control search of a DNA database, a colleague has informed us of discovery of a pair of inmate brothers who reportedly share identical DNA profiles at ten STR loci (their parents are closely related).
These observations in siblings have important implications for forensic geneticists, as it becomes important to consider the matter of brothers in cases in which complete multi-locus DNA profiles are not obtained (e.g., due to DNA degradation). Also, in a search of a DNA data bank, a high degree of allele sharing can provide an important investigative lead (i.e., possibly implicating a brother) even in the absence of a complete profile match of crime scene evidence against a registry of convicted offenders. Thus, it is important for DNA data bank administrators to have a system to notify law enforcement when a high degree of allele sharing is found in a computer search, even if it is not a complete “match” at all loci. In reality, sibling issues are not the rule in forensic investigations, and low-stringency database searches would usually lead to too many “partial profile hits” to be of any practical use in the majority of investigations. If very large numbers of SNPs are assayed, discriminant function analyses will be helpful in further study of the utility of reduced stringency searching of forensic samples against offender profiles.
Surname Searching as a Forensic Tool? Interestingly, males generally “inherit” yet another trait from their father, other than their nuclear or Y-chromosome DNA profile - their family name or surname. Thus, database searching of similar surnames could be considered in an investigative search of possible suspects, once the Y-chromosome DNA haplotype profile is identified in forensic evidence (see Kayser et al, 2002). For example, in a hypothetical investigation, once a specific Y-chromosome DNA haplotype is identified in key forensic evidence, a database search might identify a haplotype as one commonly found in males named “Adamsky”, or “Baker”, or “Smiley”. Individuals with the same surnames might be targeted for investigation or questioning, or even for court-mandated DNA profiling. There would certainly be civil liberties concerns about this concept, relating to the issue of lack of individualized suspicion as a basis for any involuntary DNA profiling based only on the circumstance of having a certain last name.
While surname searching based on Y-chromosome haplotype DNA profiling might seem an implausible idea, in theory, it could be a highly efficient search strategy in cases of rare DNA profiles associated with uncommon surnames. However, one major limitation of this concept is evident - because of alternative name spellings or name changes, variant surnames could also be included in such a hypothetical search strategy. Depending upon the rarity of the profile or of the surname, the number of such potential suspects could be small or enormous. Another example of a serious limitation of surname searching would be the common issue of non-paternity or of unknown paternity. Unknown or disputed paternity is very common in some countries and non-paternity alone (e.g. one/both partners may not be biological parents) has been estimated to occur in up to 15% of pregnancies, regardless of religious or socioeconomic class. These factors add considerable complication to the hypothesized use of surname searching algorithms.
How our law enforcement and public safety officials, forensic scientists and concerned public would react to such theoretical, yet very possible, search strategies is difficult to predict. In the immediate aftermath of high profile crimes there is often a public outcry for use of whatever means necessary to solve a crime. Also of note in the aftermath of the September 11, 2001 attacks on the U.S. were the initial displays of support for national identification cards, increased gun control and other government intrusions on privacy and constitutional freedoms. Whether such public displays of support will be short lived is uncertain, as is the constitutionality of such database search strategies. Nevertheless, it should be noted that once legislatively mandated practices intended to increase public safety are in force, they rarely are reversed legislatively and are seldom ruled impermissible by the courts.
Database search of DNA from “consent samples” and from military personnel: One area of particular interest will be the extent to which so-called consent sample DNA profiles are searched against unsolved prior or future crime scene evidence. Consent samples (sometimes referred to as “elimination samples”) are those provided voluntarily by those questioned by peace officers investigating unsolved crimes. These volunteers could include family members, spouses, and friends of victims. Once eliminated as a contributor of the crime scene DNA profile, those providing the samples may have the understanding that their samples will be destroyed and that their DNA profiles are not searched against the database of unsolved crime scene DNA profiles.
In 1994 a sexual assault conviction was overturned (Canada, see R. v Borden 2 S.C.R.145,92C.C.C.(3d)404,33 C.R.(4th)147) on the basis of lack of informed consent from an individual contacted about providing a DNA sample as an “elimination sample” that was later found to match key evidence in a second case in which he was not identified by the victim. The Supreme Court of Canadafound that in order for a person to waive their constitutional right to be secure against unreasonable seizure, the person must be possessed of the requisite informational foundation for a true relinquishment and that the consent form must make clear the scope of the investigation. More recently (1999) the Supreme Court of Canada (R. v. Arp 3S.C.R.339, 129C.C.C.(3d)321,20C.R.(5th)1) ruled that there was no violation of an individuals constitutional right to be secure against unreasonable seizure, when an individual consented to providing a DNA sample as an “elimination sample” in a homicide case and the police subsequently obtained a search warrant to seize the sample for a different investigation. The Court found that the police have an obligation only to disclose the anticipated purposes known at the time the consent is obtained. Further, if there is no limitation or restriction on the use of the evidence by the individual or the police, then it is admissible for use in another investigation. With regard to the U.S. military, all enlisted and commission military personnel must provide blood samples which are preserved on special blood spot cards which are then stored, as the modern “dog tags”, for use in the event that the individual is killed, injured or missing in action. The blood spot cards provide a source of a reference DNA sample to be used in identification of “the unknown soldier”, or as in the case of the 9/11 Pentagon attack, to return the remains of the victims to their families. However, it is not widely appreciated that, under certain specified conditions (i.e., judicial court order) access to these cards could be ordered in order to obtain DNA profiles in military investigations of criminal activity. Thus, this blood card “bank” could be interrogated for forensic investigations in a manner different from the original intent of the collection. Indeed, some have quietly advocated DNA profiling and database searching of such “banks” in a mass search (or “sweep”) if a military person is a suspect in a crime. Such actions would most certainly be challenged in the military courts. Also, there would undoubtedly be harsh criticism of such proposals by the 2 million + members of the U.S. military at the idea that their “DNA dog tag” bank be searched in a way similar to that of a registry of convicted offenders. While offensive (to this author), such a search would not be that different to searching computerized records of digital fingerprints taken “voluntarily” from all military members, virtually all sworn law enforcement officers, and many other individuals employed in sensitive positions involving security clearances.
Summary Conclusions and A View to The Future of Forensic DNA Profiling- The Genetic Eyewitness The utility of DNA variation in forensic investigations has been amply demonstrated during the past decade. Applications include civil paternity testing, medical diagnostics, and forensic testing, as well as use in identification of victims from war and mass casualties. Its use as an exculpatory tool and as a powerful inclusionary evidence cannot be ignored and provides, in some cases, the make or break evidence freeing incarcerated inmates or sealing the fate of defendants in jury trials. Interpretational challenges involving complex DNA mixtures will continue to require the expertise of experts as will development of novel methods for even more rapid and cost effective technology for DNA profiling.
Advances in biotechnology will continue to improve the array of laboratory methods used for forensic profiling of human and non-human DNA. Profiling of SNPs, use of microarrays for mass screening of hundreds of loci, and development of robotics will reduce the costs in time and labor needed to perform this testing. Lack of qualified personnel or funding will continue to require crime labs to send out samples to commercial contractors. Technical advances will allow DNA extraction, profiling, and searching a database of known offenders without the long delays now encountered in some jurisdictions.
New methods, along with development of miniaturized kits, have led some to speculate about applications of forensic DNA profiling for use in the field. Indeed, we are not far from the day where miniaturized crime scene kits could permit DNA extraction, analysis and computer database searching right at the crime scene. While possible, it is unclear whether such actions would be desirable or sensible, or whether such field applications would conform to requirements for standardized protocols in laboratories that would conform to DAB or SWGDAM requirements.
Storage of evidence and protocols for retrospective DNA profiling of previously adjudicated cases will continue to challenge the courts and the manpower of personnel in the criminal justice system.
Despite the remarkable capabilities of modern crime laboratories, fiscal imperatives often prevent optimal use of forensic DNA profiling of searching the existing databanks. For example, in individual cases funding shortages typically limit the number of evidentiary exemplars from being thoroughly examined. What effect this has on the result of individual cases is unknown. In old unsolved cases, this lack of DNA extraction obviously precludes the DNA profiles from being entered into the crime scene index. Because of these funding and staff shortages, searching the DNA profiles against the profiles obtained from other solved or unsolved cases, or against the profiles of known offenders, cannot be performed. Thus, many unsolved or cold cases languish in the archives of crime labs, waiting for that tenacious investigator, committed forensic scientist, or concerned family member to reactivate the case. Fortunately, in the U.S., federal funding has allowed for some backlog reduction to be accomplished, but other factors prevent such action in many cases. This is indeed unfortunate in light of the recidivistic nature of many offenders. Moreover, the powerful exculpatory power of DNA profiling and the possibility of exoneration remind us of the need not to ignore certain cases in which DNA profiling was never performed.
Several practical matters account for the tremendous backlog in working old unsolved cases that might benefit from modern DNA analysis. The first is a shortage of qualified examiners in the crime labs. Even though advances in computer robotics and sample tracking software have eliminated many hours of tedium in the handling and processing of samples, the initial examination of evidence, selection of which exemplars to test, and the interpretation of results requires highly skilled individuals, whose work will be scrutinized in the courts. Second is the fact that difficulties with proper evidence storage may prevent successful extraction of DNA years later. Once adjudicated, case crime scene evidence is very often stored properly in crime labs or police storage facilities under carefully controlled conditions. However storage practices are highly variable and sadly key evidence that may have been untested may be more haphazardly stored in less than ideal conditions, or even discarded, preventing current or future technologies from being applied in retrospective analysis. Very recently, legislative proposals have been offered in several U.S. states to require that all evidence that might contain DNA be stored indefinitely.