DNA Sequencing and Fragment Analysis

The Polymerase Chain Reaction, or PCR, is a basic method used in molecular biology to produce copies of a small target region of DNA in a sample. The basics to PCR were discussed previously here. The copies of DNA produced by PCR provide researchers with sufficient copies for other applications in research including automated Sanger sequencing. Although there is basic methodology to most PCR methods, each reaction is different and requires optimization, a process for adjusting variables and producing a single desired product.  There are several factors to consider when optimizing PCR such as total copies of target DNA, primer concentration, MgCl2 and deoxynucleotides, or dNTPs.  Some of these variables depend on the total volume of the PCR reaction because the final concentration of the components in PCR should be constant depending on whether the reaction is 25 ul, 50 ul or 100 ul. In this article we will focus on two variables, the number of copies of the target DNA and primer concentration.

The Template: Target DNA

Generating copies of a target DNA region using PCR applications is not as sensitive to the quality of the template DNA when compared to Sanger sequencing. However, it is still advisable to use a relatively pure DNA sample free from salts and other contaminants. Clean template DNA has a better probability to generate a clean PCR product. The final diluted sample of target DNA is better diluted in water rather than buffer because buffers can interfere with difficult PCR amplifications.

The most important aspect of the target DNA to consider is the total number of copies in the reaction available for amplification. The target DNA provides the initial template for the amplification of the first set of products amplified and continues to provide the template for the remaining cycles. As PCR products are generated, they also provide copies of the target DNA used as a template for amplification. This is what allows PCR to generate millions of copies of a target region. Therefore, it is important that sufficient copies of the original target DNA are present in the reaction. Too many copies of the original target can lead to generation of false products early in PCR that also act as template DNA. The template DNA isolated from bacteria may consist of only a 2 million-base genome whereas the human genome has 3 billion bases. Therefore, bacterial genomic DNA will have far more copies of the target in a 50 ng sample than human DNA. For bacterial DNA 10E5 copies will require only 300 picograms of DNA. For human DNA 10E5 will require over 300 nanograms of DNA, a one million fold difference.

PCR conditions generally recommend 10E4 to 10E5 copies of the target DNA in the reaction independent of the total volume. There is some flexibility in the copy number of the target sequence. However, more copies of the target DNA will reduce specificity of the PCR reaction and likely produce a greater number of false products. The total number of cycles for PCR should be reduced when higher concentrations of target DNA are in the reaction.

Concentration of the Primers

Primers are the determining factor of what region of the DNA will be amplified by PCR. The forward and reverse primer must have an exact base match with the beginning and end of the target region. Excessive primer concentration is perhaps one important factor that often causes generation of false products in a PCR. Too much primer reduces specificity and this will allow primers to anneal in regions of the template that are not the target region. The results of excessive primers are often seen in unclean Sanger sequencing results because false products can be sequenced along with the desired target. The amount of forward and reverse primer should be limited to reduce potential false priming. Excessive concentrations of forward and reverse primers can also cause formation of primer dimer when the primers anneal and amplify themselves independent of the target DNA.

Primer concentration is one variable dependent on the total volume of the PCR reaction in order that sufficient copies of the primer find the target annealing sites. A total concentration of 0.5 micro-Molar (uM) to 1 uM is generally sufficient to amplify most target regions, although a smaller concentration may also work in some applications. Typically our lab uses a final concentration of 0.8 uM for most PCR reactions. The final judgment on primer concentration will be viewed after products are electrophoresed on an agarose gel in order to show the number of products amplified.

We use a relatively simple calculation to dilute primers to a final concentration of 10 uM as shown starting with the primary primer concentration of 1 micro-grams (ug)/ micro-liter (ul). It requires that the molecular weight (MW) of the primer is known and should be provided along with the primer.

1 ug/ul  *1umol/MW (ug) *10E6 ul/l  = concentration umol/l which equals  uM

A primer with the final concentration of 200 uM will be diluted by adding 1 ul of the primer to 19 ul of water for a final concentration of 10 uM. This is our working concentration for PCR. For a final concentration of 0.8 uM, 2 ul of the forward and reverse primer are added to a 25 ul reaction whereas 8 ul of each would be added to a 100 ul PCR reaction.

Primer concentration is one of the more important variables to consider when optimizing a PCR reaction. Concentrations greater than 1 uM could often lead to primers annealing along non-target regions and the generation of false products.  Insufficient concentrations of either primer could result in little or no amplification.

Please go here if you would like to download a

reprint for this article in pdf format

The Polymerase Chain Reaction (PCR) is one of the more utilized protocols in the genetic sciences. The development of PCR has allowed researchers to amplify a specific region of genomic DNA. It has made it possible to amplify millions of copies of DNA. PCR applications have expanded to include sequencing, fragment analysis, real time PCR, chip arrays and other techniques. Although the design of many of the new technologies differs from basic PCR, they all use the same principle to amplify DNA.

The PCR procedure itself has changed little over the years. However, better polymerase enzymes for catalyzing the process have been developed. The thermocycler for heating and cooling has even been improved. PCR reactions consist of a premix composed of deoxynucleotides (dNTPs) to supply the necessary bases and Taq polymerase to catalyze the reaction mixed in a buffered medium. Then a template and markers (forward and reverse primers) are added. The template is the DNA to be amplified. The markers determine what region is amplified. This soup is then placed in a thermocycler to be heated and cooled which causes the DNA to be amplified.

One of the more difficult issues with PCR is to ensure that only a single region of DNA is being amplified so that copies of that region only are all that is produced. This requires adjustments in the PCR mix and in the thermocycling conditions to optimize the reaction. Secondary products often result when PCR conditions are not fully optimized. It is particularly important in fragment analysis applications when multiple groups of primers are added to the same mix in order to amplify different regions together in one reaction.

Variables in PCR Optimization

A typical PCR cycle is shown in Figure 1. The template and markers are added to a buffered solution containing Taq polymerase and dNTPS before the mixture is placed in the thermocycler. It is also important to note that the buffer includes magnesium chloride (MgCl2) as a necessary co-factor. A common cycle for PCR includes the denaturing step, the annealing step, and then the extension. The mix is first heated to approximately 95 C to denature the double stranded template. This opens up the DNA for the markers and Taq polymerase. After heating, the mixture is cooled to allow the markers to anneal to the complimentary region. Finally, the mix is heated slightly for extension. During extension, the polymerase moves along the template DNA incorporating dNTPs to produce a complimentary copy of the template. At the end of one cycle, an identical copy of the desired region is produced as a small PCR fragment. The newly produced fragment and the original template both function as template DNA for the next cycle in PCR amplification. After 25 cycles, millions of copies of a given region are produced and used for further study.

Image

Once PCR amplification is complete, the PCR fragments can be tested for purity using agarose gel electrophoresis. Including a standard ladder with known fragment sizes provides an indication of the size of the amplified product as shown in Figure 2. Once PCR conditions are well optimized, the PCR product should appear as a clean single band. The presence of smaller secondary bands sometimes results from mis-priming when the PCR is not completely optimized. It is possible to gel purify a product by cutting the desired band from the gel and isolating the DNA using a commercial kit. However, even a single band can mask the presence of a secondary band. It is important to have a single clean product, particularly for additional testing such as automated DNA sequencing.

Image

PCR can be Multiplexed

Multiplexing a PCR reaction is particularly useful in automated fragment analysis when different sets of markers carry a different fluorophore (fluorescent label). Multiplexing can provide savings in time and cost when a large number of samples are to be analyzed. Figure 3 shows the results for a fragment analysis application multiplexing 10 different markers. This is very common in forensic science where DNA fingerprinting typically multiplexes up to 16 different regions.

Image

The next few articles will focus on the variables involved in basic PCR and how PCR can be optimized.

Please go here if you would like to download a

reprint for this article in pdf format

The genome consists of the entire DNA content in a cell. The human genome consists of approximately 3 billion bases. Many regions of DNA are simple base repeats that do not represent any gene. Some genes are inactive as defined by the epigenome, which plays a major role in cell differentiation. Each type†of cell contains the same copy of the genome. However, genes that are active in one type of cell, such as a skin cell, are not necessarily active in another type of cell, like a liver cell. The difference is found in the methylation of certain genome regions that cause binding to proteins called histones. Genes that lie within bound regions of the genome are silenced, meaning they are inactive.

The Epigenome’s Relation in Cancer

The epigenome can also affect whether certain individuals will develop cancer. People with the genetic potential to develop cancer may not necessarily become sick because the changes may be located in silenced regions of the genome. However, certain external factors can affect the epigenome, thus activating silenced genes. Tobacco, radiation, ultraviolet sunlight and other chemical or radioactive agents can disrupt normal methylation patterns in a group of cells (figure 1). The aging process also plays a role. Throughout life, cells die and are continually replaced. Over time, the number of cell division eventually leads to a loss in methylation for many cells. It is a reason why skin always exposed to sunlight tends to look older than normal skin. Fortunately damage to the epigenome is reversible.

Epigenetics2

Cancer Treatments in Epigenetic Studies

Researchers at a number of medical institutions are investigating epigenetic treatments for cancer. Most cancer treatments, like chemotherapy, work by killing cancer cells. However, these treatments may also kill healthy cells. Epigenetic treatments provide a more targeted approach to treating cancer by repairing epigenetic damage. A research team at John Hopkins is working with azacitidine and decitabine. Both medications were found toxic to healthy cells when used in high dosage. However, low dosages of the medications have little effect on healthy cells while repairing epigenetic damage. In vitro studies have shown that specific combinations of both treatments have reduced cultured tumor cells.

Epigenetics as a field has provided a new direction for cancer treatment. It has established that an individual has more control over the potential development of cancer by avoiding factors that cause epigenetic damage.

Please go here if you would like to download a

reprint for this article in pdf format

Researchers have completed sequencing the entire human genome. The genome consists of more than 3 billion bases and was completed ahead of schedule. Technological advancements from slab gel to capillary sequencing combined with data management allowed scientists to process tremendous amounts of information. What once required years to complete now takes weeks as development of Next-generation sequencing increased sequencing capacity.

Despite abilities to sequence whole genomes, researchers also compare individual genes that could consist of 5,000 (5 kb) base fragments. One example would be comparison of a gene isolated from wild type versus a mutant isolate. The project could be designed to determine function of a gene or how mutation affects the function in an effected individual. Sanger sequencing remains a principle method for comparing fragments much smaller than an entire chromosome or genome.

Isolating the Gene from Genomic DNA

Genomic DNA does not provide a good source of template DNA for Sanger sequencing applications. The Sanger dye-terminator method is a linear amplification requiring sufficient copy numbers off the original template. The Polymerase Chain Reaction (PCR) is used to amplify copies of a particular region of genomic DNA and isolate these smaller fragments from genomic DNA. Primers flanking the region of interest determine what region is amplified. PCR amplification extends from the primers in forward and reverse directions. The region of DNA between the primers is multiplied logarithmically into smaller and more manageable fragments of DNA.

Researchers could sequence directly from the PCR fragment or choose to insert the fragment into a bacterial plasmid for cloning. Plasmid DNA provides certain advantages over direct sequencing from a PCR fragment. Plasmids typically include universal priming sites flanking inserted DNA that allow complete sequencing of the insert. Direct sequencing from a PCR fragment requires that the original PCR primers be used. Approximately 50 bases of the fragment would not be sequenced as with plasmid DNA.

Sequence Results and Primer Walking

Both PCR fragments and plasmid DNA are double stranded. Researchers could sequence using both strands of DNA. Sequence from the forward primer extends in direction of the reverse primer. Reverse sequence likewise extends toward the forward primer. Eventually forward and reverse sequence meet in the middle to complete sequencing the 5 kb insert.

Sanger sequencing capillary technology generates 800 to 1,000 bases of sequence data. A gene consisting 5 kb would not be covered from one set of sequence data. The newly generated sequence results provide the known sequence for designing additional primers (primer walking) for another set of sequences as shown in figure 1.

GeneAssembly1
Gene Assembly: Assembling Sequence Results

The primer walking process continues until generated sequence data covers the entire DNA insert or fragment. Once sequencing is complete, results are assembled into a contiguous (contig) sequence using one of several available software programs as shown in figure 2. Assembly programs offer the advantage of using electropherogram data showing peak quality. This helps reduce potential errors in final results. The final assembled product is summed into a single contiguous sequence for further comparison.

GeneAssembly2

One comparison researchers often perform is results of a wild type gene to one with potential mutation. Mutations could consist of single bases changes, insertions or deletions (indels). Because gene transcription is based on a three base code, indels could be particularly problematic by causing a shift (frameshift) in bases coding amino acids. The protein product of a frameshift could completely eliminate the function of the protein.

Of course this is only one example where gene assembly is used for research. Gene assembly could be used for many applications including a better understanding of the functions of certain proteins. Mutation could alter the protein to a degree where the protein in incapable of performing a necessary function.

Please go here if you would like to download a

reprint for this article in pdf format

Genetic research has evolved over the past ten years. The development of next generation sequencing has provided researchers with a tool capable of sequencing an entire microbial genome. Epigenetics is a new field in which science investigates external factors that effect cellular histones – the protein complexes that control gene expression. It has led to new developments in cancer research and treatment. Despite these new technologies, research still has limitations. Currently, less than one percent of bacterial species could be isolated into pure cultures under a laboratory setting. The answer to this problem is another new field in genetic studies called Metagenomics. In Metagenomics, the total microbial content of an environmental sample is isolated together to analyze the communal genome.

What is the Source of Metagenomic Samples

Samples used in Metagenomic studies are taken directly from the environment. The environment could be defined as soil, water, hot spring, or even inside the mouth of an animal. Each sample could harbor numerous species of microbes including bacteria, fungi and virus particles. We will primarily focus on bacteria cells.

Once the sample has been acquired, bacteria are isolated together. Different species are not separated into pure cultures. Because the environmental sample is unique in terms of mineral content, moisture, pH and other factors, the species of bacteria are related by their ability to grow in this environment. It is believed that the environment in some way shapes genetic development and expression similarly in different species. Therefore, each bacterial type shares basic genetic patterns.

How Are Environmental Species Analyzed

Once the bacteria have been separated from the environmental sample, the DNA is isolated using common extraction techniques. Once the DNA sample has been isolated and purified, the sample is analyzed using fragment analysis or DNA sequencing techniques. Next generation sequencing has been especially useful in determining the sequence of a communal genome. The figure below provides the basic steps in a Metagenomic study.

What is the Goal of Metagenomics

The entire sequence of a communal genome could be compared to bacteria taken from other environmental samples. A comparison of each communal genome could show how environmental factors have shaped the community. It could aid in determining how pollutants and other chemicals have altered basic gene sequences when compared to a relatively clean environmental sample. Another study could isolate bacteria from different seawater depths to compare genetic changes in the community as a result of pressure and light differences.

Fragment analysis is also a useful technique for examination of Metagenomic samples. It is a targeted approach investigating certain genetic functions. The genetic presence of a particular metabolic pathway such as the ability to metabolize a mineral is a good example.

Metagenomics as a field has provided a new way to look at an abundance of genetic material without the process of isolating separate species into pure cultures. It has helped to further understand environmental influences on genetic development. However, it does not yet show the complete relationship separate species living together may have on each other. An example of this is when one species produces a product utilized by another, thus shaping the genetic expression of both. However, it has led to new discoveries that could eventually impact the medical field.

Please go here if you would like to download a

reprint for this article in pdf format

During the 1970s, Frederick Sanger developed a new technique allowing the base sequence of DNA to be determined. The design of his method is still very popular today. Sanger employed dideoxynucleic acids, ddNTPs, in addition to deoxynucleic acids together in the amplification of DNA during the Polymerase Chain Reaction (PCR). Instead of amplifying a section of DNA, the ddNTPS would cause amplification to terminate in a random number of amplified products. The ddNTPs were also radioactively labeled for detection. The end result of the Sanger chemistry would be a series of new DNA products with sizes increasing by a single base.

The PCR products were separated on a medium, polyacrylamide gel, with smaller products migrating faster than the larger products. The end result appeared like a ladder when detected.

The basics of the chemistry remain the same today. What has changed in Sanger sequencing is the method of separating the products and detection.

Original Base Labels were Radioactive

Before development of the fluorophore, Sanger labeled ddNTPs with a radioactive tag on the 5 prime end of the base. Because there was no method that could identify the different bases, a sample was amplified by PCR in four separate tubes. Each tube represented one of the four bases making up DNA. The amplified products were loaded separately into four lanes and allowed to migrate. Once complete, the gel was photographed by x-ray to view the result as shown in figure 1.

Fluorescent Labeled ddNTPs Replaced Radioactive Labels

Automation replaced the manual sequencing method with development of fluorescent labeled ddNTPs. Slab gels were still poured. However, all four bases were combined into a single reaction and loaded together. Samples would electrophorese and separate in the gel. Once the amplified products reached a region on the bottom of the plates, a laser would excite the fluorescent labels and color would be recorded by camera. Resulting images were collected by computer and analyzed as shown in figure 2.

Development of the automated chemistry allowed sequencing to be performed much faster. First, one lane on the gel was required for each sample. Second, sequence was recorded as electrophoresis was performed. The smaller more quickly migrating bands could be run completely through the gel as larger bands were recorded. Therefore more bases could be determined.

The problem that potentially occurred was in the plates. Because sequence results were recorded through the plates, the plates needed to be clean from debris and clear of any scratches. Although plates also needed to be clean for the radioactive method, it was more stringent for automated sequencing.

A significant amount of sequencing was performed using automated slab gel sequencing. But researchers still needed to pour plates and electrophoresis was performed for 12 hours or longer.

Capillary Sanger Sequencing

Capillary development occurred during the 1990s beginning with a single capillary machine, the ABI 310. The new technology eliminated any need for pouring gels. Instead, semi-liquid polymer was injected into the capillary before each run as shown in figure 3. Individual runs reduced the length of time required for electrophoresis to less than 3 hours.

The automated process changed very little from slab gel machines to capillary. But capillary sequencing was faster and more sensitive. It required much less DNA added to the PCR amplification. Better chemistries were developed in conjunction with automated sequencing. Capillary sequencers today characteristically perform 4 to 96 samples in a single run. The runs generally require 2 to 2 ½ hours of electrophoresis. Automated injection of samples allows hands-off operation through multiple sets of samples.

Frederick Sanger developed an important method for sequencing DNA. It allowed early completion of the human genome project, a genome with 3 billion bases. Although next-generation sequencing has expanded sequencing capabilities, Sanger sequencing is used for small sections of DNA often used in medical research.

Please go here if you would like to download a

reprint for this article in pdf format

Ethanol with sodium acetate is used as a method for precipitating DNA in solution. It is preferred over simple drying in order to concentrate DNA. Drying concentrates salts and contaminants in addition to the DNA. Ethanol precipitation allows contaminants to be removed along with the excess liquid while the DNA forms a solid pellet on the bottom of the tube. It is a good method. Isopropanol is also used as a replacement for ethanol. It requires less volume. However, we recommend ethanol, as it seems to clean the DNA more efficiently.

Sodium acetate is a salt. The salt is a necessary ingredient because it acts as a carrier for the DNA. Without sodium acetate, the majority of the DNA would remain in solution. It would be lost along with the other contaminants.

The following method works well for final purification of PCR fragments and plasmid preparations with a volume of 20 to 25 ul. Glycogen replaces sodium acetate as the DNA carrier.

Things You Need

95 % Ethanol

70 % Ethanol

Glycogen (20 mg/ ml)

Microcentrifuge tubes (1.5 ml)

Microcentrifuge

Hot plate with tube block

Protocol

Step 1: Add 1.5 ul glycogen to DNA in solution. It is important to add the glycogen before adding ethanol because ethanol will cause glycogen to precipitate.

Step 2: Add 80 ul 95 % ethanol.

Step 3: Vortex to mix components.

Step 4: Allow DNA and glycogen to precipitate for a minimum of 15 minutes.

Step 5: Centrifuge for 15 minutes to pellet DNA.

Step 6: Remove excess ethanol by decanting or using a pipette.

Step 7: Add 200 ul 70 % ethanol

Step 8: Mix gently by inverting the tube several times.

Step 9: Centrifuge for 15 minutes.

Step 10: Remove excess ethanol by decanting or using a pipette.

Step 11: Dry in heat block to remove remaining ethanol.

Step 12: Re-suspend in 10 ul purified water.

Step 13: Determine concentration.

Variations to the Protocol

Temperature and time are 2 variables that effect the precipitation of DNA. Often precipitation is performed at 4O C to improve yields. Time is also a factor. Precipitating for longer periods could improve yields. However, using lower temperatures and longer precipitation times could drop certain contaminants out of solution. Some caution should be used when changing both variables.

Once the DNA pellet is dry, it could be re-suspended in water or a buffer of choice. Water has the advantage of adding no additional salts initially removed during the precipitation. The disadvantage of water is that DNA is more easily subject to degradation. Water is a reasonable diluent for short-term applications or Sanger sequencing. Some sequencing facilities recommend water for the final aliquot in preparation of samples. TRIS and TRIS with EDTA are also used for long-term storage of DNA samples. Buffers containing EDTA are not recommended for samples submitted for Sanger sequencing and PCR because EDTA interferes with the MgCl2, a necessary component for amplification.

Please submit any additional suggestions or questions related to the glycogen protocol for ethanol precipitation of DNA.

Please go here if you would like to download a

reprint for this article in pdf format

Follow

Get every new post delivered to your Inbox.