DNA Sequencing and Fragment Analysis

Researchers have completed sequencing the entire human genome. The genome consists of more than 3 billion bases and was completed ahead of schedule. Technological advancements from slab gel to capillary sequencing combined with data management allowed scientists to process tremendous amounts of information. What once required years to complete now takes weeks as development of Next-generation sequencing increased sequencing capacity.

Despite abilities to sequence whole genomes, researchers also compare individual genes that could consist of 5,000 (5 kb) base fragments. One example would be comparison of a gene isolated from wild type versus a mutant isolate. The project could be designed to determine function of a gene or how mutation affects the function in an effected individual. Sanger sequencing remains a principle method for comparing fragments much smaller than an entire chromosome or genome.

Isolating the Gene from Genomic DNA

Genomic DNA does not provide a good source of template DNA for Sanger sequencing applications. The Sanger dye-terminator method is a linear amplification requiring sufficient copy numbers off the original template. The Polymerase Chain Reaction (PCR) is used to amplify copies of a particular region of genomic DNA and isolate these smaller fragments from genomic DNA. Primers flanking the region of interest determine what region is amplified. PCR amplification extends from the primers in forward and reverse directions. The region of DNA between the primers is multiplied logarithmically into smaller and more manageable fragments of DNA.

Researchers could sequence directly from the PCR fragment or choose to insert the fragment into a bacterial plasmid for cloning. Plasmid DNA provides certain advantages over direct sequencing from a PCR fragment. Plasmids typically include universal priming sites flanking inserted DNA that allow complete sequencing of the insert. Direct sequencing from a PCR fragment requires that the original PCR primers be used. Approximately 50 bases of the fragment would not be sequenced as with plasmid DNA.

Sequence Results and Primer Walking

Both PCR fragments and plasmid DNA are double stranded. Researchers could sequence using both strands of DNA. Sequence from the forward primer extends in direction of the reverse primer. Reverse sequence likewise extends toward the forward primer. Eventually forward and reverse sequence meet in the middle to complete sequencing the 5 kb insert.

Sanger sequencing capillary technology generates 800 to 1,000 bases of sequence data. A gene consisting 5 kb would not be covered from one set of sequence data. The newly generated sequence results provide the known sequence for designing additional primers (primer walking) for another set of sequences as shown in figure 1.

GeneAssembly1
Gene Assembly: Assembling Sequence Results

The primer walking process continues until generated sequence data covers the entire DNA insert or fragment. Once sequencing is complete, results are assembled into a contiguous (contig) sequence using one of several available software programs as shown in figure 2. Assembly programs offer the advantage of using electropherogram data showing peak quality. This helps reduce potential errors in final results. The final assembled product is summed into a single contiguous sequence for further comparison.

GeneAssembly2

One comparison researchers often perform is results of a wild type gene to one with potential mutation. Mutations could consist of single bases changes, insertions or deletions (indels). Because gene transcription is based on a three base code, indels could be particularly problematic by causing a shift (frameshift) in bases coding amino acids. The protein product of a frameshift could completely eliminate the function of the protein.

Of course this is only one example where gene assembly is used for research. Gene assembly could be used for many applications including a better understanding of the functions of certain proteins. Mutation could alter the protein to a degree where the protein in incapable of performing a necessary function.

Please go here if you would like to download a

reprint for this article in pdf format

Leave a comment