From Spit to Screen: The Journey of a DNA Sample

One of our researchers, Paul Woodbury, describes the journey of a DNA sample from the instant the sample is taken until it is analyzed in the laboratory.  The following article is a reprint from the July-September 2020 issue of the National Genealogical Society Magazine and is published here with permission. 

23andme Collection Kit

This 23andMe collection kit and similar AncestryDNA collection kits rely on saliva collection. Hong Chang Bum, “IMG_0901” ( Attribution-NonCommercial-NoDerivs 2.0 Generic (CC BY-NC-ND 2.0) license.

How does DNA testing actually work? How can spitting into a tube result in an ethnicity estimate, a list of genetic cousins, and other DNA data? This article reviews the technology that enables genetic genealogy and the five-step process that transforms a saliva sample into a comprehensive genetic report: collection, extraction, amplification, testing, and data analysis (1).


Complete copies of the human genome are carried by most of the trillions of cells in the human body. While red blood cells and some skin, hair, and nail cells do not carry nuclear DNA, nearly any other type of cell can be sampled for DNA analysis.

In the early days of genetic genealogy testing, companies utilized blood samples. Now, most genetic genealogy testing companies collect DNA through less invasive and more convenient spit or cheek swab kits. The DNA obtained from these kits originates from white blood cells in saliva and buccal epithelial (cheek) cells.

Most DNA testing companies discourage testers from eating, smoking, drinking, chewing gum, brushing teeth, or using mouthwash in the half-hour before taking a DNA test. While foreign particles from food, liquids, toothpaste, and tobacco do not alter DNA, they can mask it or cause it to degrade(2).

Testing companies also warn against activities that might cause cross-contamination of a sample. For swab collection kits like those used by Family Tree DNA and MyHeritage, testers should be careful not to drop the swab in anything that might contaminate the sample, touch the collection swab with their hands, or brush it against other objects. When performing spit collection tests like those utilized by AncestryDNA and 23andMe, testers should try to collect all the necessary saliva at once to avoid contamination from foreign materials in the air.

DNA testers should register their kit with the corresponding testing service to ensure later access to the test results. Some labs will not process kits that have not been registered.

National Geographic Collection Kit

This now obsolete National Genographic collection kit is similar to Family Tree DNA and MyHeritage collection kits which rely on buccal swabs. Paulo O, “Genographic Kit andAntares info kit” ( Attribution 2.0 Generic (CC BY 2.0) license.

While DNA can sometimes last very long in the right environment, degradation can occur due to proteins that destroy DNA, foreign materials like food, bacteria, and other chemicals, or large fluctuations in temperature. To prevent degradation and to keep DNA intact from the time it is collected to the time it is ready to be analyzed, sample collection kits typically include a liquid buffer solution.

These solutions stabilize the cells, sometimes include antibacterial elements, inhibit the activity of proteins that would degrade the DNA, and preserve the DNA in a stable pH solution which is not as easily affected by fluctuations in temperature. With such a solution, DNA can be preserved while it is prepared, mailed, stored, and eventually processed by a lab.

Isolation and Extraction of DNA

Every DNA collection sample has hundreds of thousands of cells, each carrying a copy of the tester’s DNA, but these samples also contain proteins, chemicals, fats, water, and a host of other biological materials. Before DNA can be analyzed, it must be isolated from all of these other materials.

First, cells are broken open with a detergent. Cells are held together by a membrane composed of two layers of fat called a lipid bilayer. Just as detergents interact with fats in water, they interact with the lipids in cell membranes to break them open and release the contents of the cell into a solution.

Next, certain cellular components are destroyed. Cells carry proteins that interact with DNA as well as other proteins that destroy free-floating DNA. Cells also include free-floating RNA, which is similar to DNA and can cause problems in later DNA analysis. To overcome these problems, a protease (an enzyme that destroys proteins) and an RNAse (an enzyme that destroys RNA) are added to the sample.

Finally, salt is added to the mixture to make all the debris from the proteins, lipids, and RNA clump together. When the solution is centrifuged (spun in a circle at very high speeds), this debris clumps together and collects at the bottom of a sample tube, leaving the DNA floating in the solution.

After most of the debris is removed from the sample, the DNA is further isolated from the detergents, proteins, salts, and reagents used in the first step. Alcohol is added to the sample, and since DNA is insoluble in alcohol, a subsequent round of centrifuging isolates the DNA in a clump at the bottom of the test tube. The DNA has been isolated.

Polymerase Chain Reaction

In order to obtain a sufficient amount of DNA for testing, companies amplify the DNA from the original sample through Polymerase Chain Reaction protocols. Enzoklop, “Polymerase chain reaction” Creative Commons Attribution-Share Alike 3.0 Unported license.


The technologies used by genetic genealogy testing companies require a large amount of DNA for successful analysis—much more DNA than what is present in the initial sample provided by a customer. For this reason, labs utilize Polymerase Chain Reaction (PCR) protocols to amplify or copy the DNA being analyzed. DNA replicating proteins, free-floating DNA bases, and DNA primer sequences are added to a sample to create an environment conducive to DNA replication.

Next, the sample is submitted to several cycles of temperature variations. During this process, the DNA denatures or “melts” into single strands, DNA primers bind to complementary strands of DNA, and DNA polymerase (a DNA-building and replicating protein) recruits free-floating bases and extends the DNA, making a new copy.

In the first temperature variation cycle, one strand of DNA duplicates into two strands. The number of copies of the DNA doubles with every cycle, and within a few hours, it is possible to obtain millions of copies of a test taker’s DNA from the initial sample.


At this point, the way in which DNA is tested depends on the type of DNA test being performed. Autosomal DNA tests, Y-DNA tests, and mtDNA tests are treated differently. Because autosomal DNA testing is the most common type of testing, this article reviews the protocols for SNP chip microarrays.

Single Nucleotide Polymorphisms (SNPs) are locations in DNA that are known to be hotspots of variation in the general population. Rather than testing all of an individual’s DNA, testing companies typically test between 400,000 and 700,000 SNP markers across the genome. Because each individual has two copies of DNA—one from the mother and one from the father—there are three possibilities for a genotype at any given SNP marker: both copies could carry one variation, they could both carry the other variation, or they might carry different variations.

For example, if an SNP marker has two typical values of C or G, it is possible that an individual could have a genotype of CC, GG, or CG. An individual with the same SNP variation on both copies of DNA is homozygous at that location. A person with different variations on the two copies of DNA is heterozygous at that location.

Genetic genealogy tests rely on SNP Chip testing to query SNP markers for a test taker. Each testing company uses chips manufactured by Illumina, a biotechnology company. These “chips” are glass plates with microscopic silicon beads attached to predefined and indexed locations(3).

Each silicon bead, in turn, has several copies of a manufactured short single-stranded segment of DNA attached to it. These short sequences are complementary to a sequence in human DNA immediately preceding the location of a SNP.

In order to test SNP locations, a tester’s sample is treated to shear or break the DNA into smaller fragments and denatured to make it single-stranded. Next, the DNA is washed over the chip, where it binds with the complimentary manufactured DNA just short of the location of the SNP. Then DNA Polymerase and modified A, T, G, and C nucleotides with fluorescent tags are introduced. The sample DNA is washed away, leaving the manufactured strands with one more base and fluorescent tags indicating which base has been added.

The chip is then submitted to a laser reader, which causes the DNA strands to fluoresce red (homozygous for one variation), green (homozygous for the other variation), or yellow heterozygous). A scanning software records and interprets the fluorescence. It determines the color of each locus, determines what the fluorescence means for that location, and then uses the index to associate the result with a corresponding location in the genome.

Finally, the software program records the values, A, T, G, or C, that have been detected for each of the 500,000-700,000 locations that have been analyzed into a raw data file.

Laser Scan DNA

Once the DNA has been extended by one fluorescently labeled base, it is submitted to a laser scan. Green indicates homozygosity for one version of the SNP, red indicates homozygosity for the other version of the SNP, and yellow or orange indicates heterozygosity. Each color for each site is interpreted by software and associated with a particular location in the genome. Kat Masback, “Microarray, AV-0101-5194 Dr. Jason Kang, NCI (Lance Miller)” ([email protected]/3341761068). Creative Commons Attribution-ShareAlike 2.0 Generic license.

Data Processing

Autosomal DNA raw data results are composed of a list of several hundred thousand marker locations and two base values (A, T, G, or C) for the corresponding locations (one maternal and one paternal). In and of themselves, these values have limited usefulness for genealogical research. It is a comparison against reference datasets and customer databases that generate the most useful elements of genetic genealogy tests: ethnicity admixture estimates and cousin matching.

Ethnicity admixture estimates for autosomal DNA tests are obtained by comparison of a raw data file against a “reference panel” of samples for individuals with known ancestry from particular regions of the world. Prevalence of DNA marker values in specific populations is used to assign portions of a test taker’s DNA to different ethnicities or regions.

Autosomal DNA matches are identified by comparing the markers of a test subject against the markers of other tested customers in the database. When two individuals share long sequences of consecutive markers on at least one DNA copy, they share a “segment” of DNA from a recent common ancestor. Based on the size, location, and the number of segments two individuals share, it is possible to estimate how closely two individuals are related to each other.


Once raw data has been incorporated into a company’s system and compared against other customers and reference datasets, the test taker receives a notification that DNA test results have completed processing. From spit to screen, the DNA sample has been collected, isolated, amplified, tested, and processed to provide the researcher with useful information for a genealogical investigation.

Websites cited in this article were viewed on 8 June 2020.

1. “What Happens To My DNA Sample At The Lab?” 23andMe (

3. “Infinium™ Global Screening Array-24 v3.0 BeadChip,” Illumina (

Getting a DNA test is a great way to start your genealogy journey, but what comes next? Hire a professional at Legacy Tree and our genealogists will work with you to discover your family history. Contact us today for a free quote!

Subscribe for weekly posts

Paul – Legacy Tree Genealogists Researcher

From a young age, Paul Woodbury fell in love with genealogy research. To pursue his passion for this field, he studied genetics and family history at Brigham Young University. To aid in his desire to share his knowledge with others, he has also received a masters degree in instructional design and educational technology from the University of Utah. Paul currently works as a DNA team lead at Legacy Tree Genealogists where he has helped to solve hundreds of genetic genealogy cases. In addition to genetic genealogy, Paul specializes in French, Spanish, and Scandinavian research and regularly presents on topics for these areas. is a graduate of Brigham Young University, where he studied genetics and family history. He specializes in genetic genealogy (DNA research), French, Spanish, Swedish and Norwegian genealogy research.

Latest posts by Paul – Legacy Tree Genealogists Researcher (see all)

Source: Legacy Tree

Posted On: May 13, 2021 at 03:07PM