DNA Sequencing and Fragment Analysis

Archive for February, 2012

Nicked Plasmid DNA Prevents Automated Sanger Sequencing

Development of simple-to-use purification kits from a number of commercial providers has simplified preparations of DNA samples used for automated DNA sequencing. The isolated DNA is generally clean with good yield. The template can then be quantified by spectrophotometry in preparations to submit to a sequence service provider. In addition, a spectrophotometric scan (220 nm to 310 nm) will indicate whether the plasmid DNA contains salts that could interfere with a sequencing reaction. Even though the template appears clean with good concentration, some samples fail. What could cause this? One possibility is that the DNA could be nicked.

What is Nicked DNA?

Plasmid DNA is characteristically a double-stranded supercoiled molecule. A restriction enzyme is often used to cut both strands linearizing the DNA molecule. A nick is an isolated break in one of the two strands keeping the supercoiled form intact (figure 1).

How Does DNA Become Nicked?

DNA can be enzymatically nicked for certain applications. However, nicked DNA is undesirable for automated sequencing. It is likely the DNA was damaged physically by shearing during purification. Causes of damage include excessive vortexing or pipetting that physically break the DNA. Over-drying can also damage supercoiled DNA. Most commercial kits warn against over-drying a DNA preparation.

How to Detect Nicked DNA

The best method for determining whether the DNA has become nicked is using agarose gel electrophoresis. Nicked DNA cannot be identified using spectrophotometry. A typical plasmid preparation is shown in figure 2. Plasmid preparations almost always have some nicked DNA. However, supercoiled DNA should have the darker band in the resulting gel. Lane 3 provides the best quality DNA for automated DNA sequencing. Samples loaded in lane 1 and lane 2 would most likely fail.

Why Does Nicked DNA Fail to Sequence?

The enzyme lock-key model for enzymes provides a simple explanation why nicked DNA does not sequence. The enzyme, Taq polymerase, needs to sit down on the DNA to catalyze the addition of the bases during extension. A nick in the DNA loosens the strands of the supercoiled DNA strands and the enzyme no longer fits the DNA molecule.

How to Prevent DNA from Becoming Nicked

As previously stated, plasmid preparation kits generally isolate both nicked and supercoiled DNA. It is possible to reduce the amount of damage during the purification procedure. One recommendation is to thoroughly read the directions provided for the kit. There are specific steps where shaking DNA in solution should be minimal. Often it could also be recommended to mix reagents and DNA gently. Excessive vortexing and pipetting should be avoided. And the final DNA isolate should not be overly dried.

DNA is relatively stable and could be useful for years when stored under the proper conditions. But DNA has some fragile characteristics as well. Special care in preparation could reduce damage that inhibits successful automated sequencing applications.


DNA Fragments Resolve Better on Correct Percent Agarose Gel

An interesting article was posted March 25, 2011 on BitesizeBio.com titled 5 Ways to Destroy Your Agarose Gel. Every researcher may have made some of these common mistakes at one time. The five ways provided are…

  1. Use water instead of buffer for the gel or running buffer.
  2. Forget to add ethidium bromide
  3. Use the wrong percentage (or type) of agarose.
  4. Switch the leads from the power source.
  5. Drop the gel on the way to the imager.

The focus of this article is to explain the importance of using the correct percentage gel. In many genetic analysis applications a 1% agarose gel is commonly used to test plasmid preparations and PCR fragments. However, the resolution of the 1% gel may not sufficiently resolve smaller DNA products.

Percent Agarose Determines Pore Size

Agarose gel electrophoresis is a form of chromatography. The gel provides the stationary phase and electrical current provides the mobile phase. Charged particles such as DNA will migrate towards the positively charged anode in response to an electrical current across the gel. The gel provides the resistance against DNA migration. Smaller fragments move more rapidly than larger fragments.

Resistance is directly proportional to the porous nature of an agarose gel. Smaller pores provide more resistance. Increasing the percent of the gel decreases the size of the pores. When the pore sizes are too large small DNA fragments migrate together and do not become separated (figure 1). This figure illustrates why large DNA fragments should not be run on an agarose gel with small fragments of DNA.

Correct Percent of Agarose Depends on the Size Products Tested

The correct percent agarose gel is dependant on the size of the fragment that will be tested. Plasmid DNA preparations that are 5 kb to7 kb resolve well on a 1% gel. Large PCR fragments that are similar in size to plasmid DNA could also resolve on a 1 % percent gel. However, small PCR fragments that require smaller pore size for better resolution require a higher percent gel. General guidelines for mixing the correct percent gel are provided in table 1.

For small PCR fragments less than 500 bases in size, it is best to use a two percent gel. This will increase the run time. However, it will also improve resolution of fragments that are similar in size and may not resolve on lower percentage gels.



Sanger Sequencing Amplification Compared to Basic PCR

Sanger sequencing, the process used for automated sequencing, requires a DNA template to be amplified by the Polymerase Chain Reaction (PCR). Despite similarities between the processes, a sequencing amplification is different than basic PCR.

Sanger sequencing utilizes linear amplification

PCR produces millions of copies of a DNA region from a single copy of template DNA. Each copy produced during PCR in one cycle becomes a new template for the next cycle. PCR uses forward and reverse primers. The forward primer anneals to a complimentary site on one strand of DNA and extends toward the reverse primer. In turn, the reverse primer similarly extends towards the forward primer. What results is a copy of the desired region of DNA to be amplified. The new copy contains priming sites so it can be used as a template for future amplifications (figure 1). One copy of the original template produces two copies; two copies produce four in the next cycle; and so on. A twenty-five cycle PCR will produce 2E24 copies from a single template.

Sanger sequencing uses one primer instead of two. The amplification process copies one strand but not the reverse strand. The copy is the same direction as the primer and cannot be used as a template for later cycles. All amplification is directly from the original template DNA in the reaction. Therefore, amplification is linear, not exponential. It is the reason that Sanger sequencing amplification must include sufficient copies of the original template DNA to be visualized by automated sequencing equipment (figure 2).

Dideoxynucleotide bases are included in Sanger sequencing

The components of basic PCR include buffer, the enzyme Taq polymerase, deoxynucleotides (dNTPs), template, and forward and reverse primers. Sanger sequencing includes an additional component called dideoxynulceotides (ddNTPs). The ddNTPs are terminating bases that include a fluorescent tag for automated sequencing equipment. For this reason, Sanger sequencing is also called dye-terminator sequencing. During amplification, the ddNTPs will randomly sit on the DNA template and terminate the extension. The dNTPs sit on the remaining templates and continue extending. The end product is a size ladder of PCR products that increase by a single base (figure 2). Each terminating base is tagged with fluorescent dye. This dye provides a unique color representing the A, G, C, and T bases in DNA.

The DNA ladder is separated by electrophoresis

Once PCR is complete, the products produced in the Sanger reaction are loaded on an automated slab gel or capillary analyzer. Products will separate by size with smaller products moving faster through the medium. As the products near the end of the medium, the fluorescent tags are excited by light and recorded to a computer with a digital camera (figure 3). The computer records the color for each band and assigns the correct base to complete dye-terminator sequencing.


Please go here if you would like to download a

reprint for this article in pdf format


Role of Restriction Enzymes in Mapping DNA

Restriction mapping was one of the earlier methods designed to characterize a fragment of DNA. The fragment was cut into smaller fragments using a restriction endonuclease. This is an enzyme capable of recognizing a specific base sequence. Once the region is identified, the enzyme cleaves (cuts) the DNA. It is an effective method used to mark a specific sequence along a region of DNA.

What Are Restriction Endonucleases?

Restriction endonucleases are a group of enzymes capable of cutting DNA into smaller pieces. Each enzyme recognizes a specific sequence that is generally 4 to 8 bases in length. EcoR1 is a popular enzyme that cuts a DNA fragment wherever GAATTC is found. It should be noted that this sequence is a palindrome. That means the sequence is the same for forward and complimentary directions.

Most restriction endonucleases used today originated from bacteria. It is one mechanism microorganisms use as defense against foreign DNA such as bacteriophage. Foreign DNA cleaved into smaller fragments loses functionality and becomes harmless to infected bacterial cells. Each restriction enzyme is labeled from the bacterial species of origin. EcoR1 is an enzyme isolated from Escherichia coli.

Restriction Digest:

A fragment of DNA in solution is treated with a specified restriction endonuclease in a process called restriction digest. One example is treatment of a 5,000 base pair (5kb) fragment with EcoR1. The enzyme will cleave (cut) the DNA fragment every time GAATTC is found in the sequence. For example, the digest generates 5 smaller fragments with sizes 250 bp, 500 bp, 750 bp, 1,500 bp and 2,000 bp. The sum of the fragments equals 5 kb. But, how does a researcher know the smaller fragments have been generated when the DNA size is not visible in solution? Fragment sizes are visualized using gel electrophoresis.

Agarose Gel Electrophoresis:

DNA fragments of different size can be separated on agarose gel in a process called electrophoresis. The solution, with digested DNA added, is loaded on a buffered agarose gel. DNA fragments will migrate towards the positive charged anode when electric current is applied (figure 1).

Smaller fragments move through the gel more quickly than larger fragments so the fragments become separated. Once separation is complete, the DNA is stained with a dye such as ethidium bromide. Different size fragments appear as bands when exposed to ultraviolet light. The size of each fragment is estimated when electrophoresed with a standard ladder of known DNA fragments.

Partial Digest Aids Genetic Mapping

Researchers use a technique called partial digest to determine the order of fragments resulting from a full enzyme digest. A partial digest generally cleaves a DNA fragment on some, but not all, of the sites where the enzyme cut site would be. Partial digest could be performed by reducing the amount of time of digest or amount of enzyme added to the solution. One example would be the generation of 2,750 bp, 2,500 bp, 1,750 bp and 1,250 bp (figure 2)

Because smaller fragments from a full digest are multiples of the fragments from the partial digest, it is possible to determine the order of fragments along the original DNA fragment (figure 3).

Researchers perform restriction mapping along an unknown region of DNA using a combination of restriction endonucleases. It provides a relatively simple method to mark regions along the DNA that could be used in future studies. Unknown fragments that could be cut by these enzymes could also be inserted into a bacteria cell called a plasmid providing known markers (primers) that can be used to determine the entire sequence.

Please go here if you would like to download a

reprint for this article in pdf format

Poor Primer Selection

What would cause a sequencing primer to anneal in the correct site, but extend in the wrong direction? The end result was compliment to the previous sequence from which the primer was chosen.


The researcher was baffled with the result. It was a primer-walking project to complete the sequence for a 2kb insert. The first sequence was performed using universal M13(-21) and produced a clean 600 plus base read length. The primer was selected 450 bases downstream and was expected to continue in the same direction. Instead the result was a complimentary sequence covering the same area as previously sequenced (see figure).


The primer selected consisted of 20 bases. Seventeen of the bases were directly complimentary to each other forming a primer that was essentially reversible. The primer simply annealed to the same region of the reverse strand of DNA predominantly matching the 17-base compliment. The resulting sequence was very clean showing no hint of a problem. The cause was simply poor primer selection. Nowadays primer orders are generally submitted on-line and include automated evaluation of secondary structures in the primer so this problem shouldn’t happen anymore. This was an interesting phenomenon to see.

Please go here if you would like to download a

reprint for this article in pdf format

What are Phred scores?

Academic and commercial sequencing facilities alike have used a common quality rating to determine the accuracy of sequencing data. The generally used term is Phred score. It was developed by Phil Green and Brent Ewing during the 1990s. Often the scores are listed as a Q value where Q20 score is considered an acceptably accurate base call. But what exactly does a Q20 value mean?

Q is a value derived from the formula q=-10 log p where the value of p is probability. The algorithm uses the base peak quality of the individual base as well as the bases before and after that individual base.

Q value is simply the probability that a base has been called correctly based on a scale from 10 to 60 as shown…

Q10 = 90% certainty (1/10 chance of an incorrect base call)
Q20 = 99% certainty (1/100 chance of an incorrect base call)
Q30 = 99.9% certainty (1/1,000 chance of an incorrect base call)
Q40 = 99.99% certainty (1/10,000 chance of an incorrect base call)
Q50 = 99.999% certainty (1/100,000 chance of an incorrect base call)
Q60 = 99.9999% certainty (1/1,000,000 chance of an incorrect base call)

Q20 is the acceptable score for most sequencing data. It indicates a 99% certainty that the base has been called correctly. This is considered high quality data and the standard value commonly used by sequencing facilities.

Please go here if you would like to download a

reprint for this article in pdf format