DNA Sequencing Technologies – Sanger Sequencing

DNA sequencing

DNA (short for deoxyribonucleic acid) is a long string-like molecule made up of subunits called nucleotides (you may also see these called bases). DNA acts as a ‘code’, containing the biological information needed to for an organism to develop and operate in the form of genes. A gene is a length of DNA that encodes a specific protein.

The information in DNA is encoded by the sequence of four chemical nucleotides: adenine (A), thymine (T), cytosine (C), and guanine (G). In a DNA molecule, nucleotides are arranged as two long strands forming a spiral known as a ‘double helix’. Each nucleotide forms a pair with the nucleotide on the opposite strand. Nucleotides pair in a specific way: A always pairs with T, and C always pairs with G.

DNA sequencing is the process of determining the sequence of nucleotides in a piece of DNA. Establishing the DNA sequence of an organism is key to understanding the function of genes.

 

Bacterial genome sequencing

Genomic data is data related to the content, structure, and function of an organism’s genome. The study of genome data has transformed our understanding of how bacteria evolve and function, and of how clinically relevant characteristics such as antimicrobial resistance arise and spread. The sequence of all the DNA found in a given organism is termed the ‘genome sequence’.

The first bacterial genome sequence was determined in 1995 by Robert Fleischmann and colleagues, for the bacterium Haemophilus influenzae. To do this, they used the Sanger sequencing method. They discovered that the complete genome sequence for H. influenzae consists of around 1,700 genes encoded by over 1.8 million bases.

 

Sanger sequencing

Sanger sequencing was developed by Frederick Sanger and colleagues in 1977 and commercialised in the 1980s.

Sanger sequencing involves breaking the DNA of a genome into many smaller pieces, each of around 500 – 1000 bases, sequencing those pieces, then aligning the overlapping regions in order to assemble the entire DNA sequence. Fragments of DNA up to ~900 bases long can be sequenced as one ‘read’, with 99.9% accuracy1.

Sanger sequencing was the most widely used method of DNA sequencing until the 2010s, when Next-Generation sequencing methods began to be used for large-scale genome analyses due to their increased speed and efficiency and lower costs. Sanger sequencing is still frequently used for small individual pieces of DNA, or for validation of Next-Generation sequencing where high accuracy is required.

 

How does Sanger sequencing work?

Sanger sequencing uses a chain-termination reaction involving fluorescently labelled nucleotides. The following ‘ingredients’ are required:

  • The ‘Template DNA’ – the double-stranded DNA molecule to be sequenced. To sequence a DNA molecule longer than ~900 bases, this is first cut up into smaller fragments, each of which is sequenced separately.
  • A DNA primer – a short piece of single-stranded DNA that binds to the template DNA, and is required as a starter for DNA polymerase
  • DNA polymerase, a protein that carries out the DNA synthesis reaction
  • The four nucleotides (A, T, C, and G), which are known as deoxynucleotide triphosphates or dNTPs
  • Small amounts of chain-terminating nucleotides, known as dideoxynucleotide triphosphates or ddNTPs. The ddNTPs are labelled with a fluorescent dye, each in a different colour.  
Sanger sequencing 'ingredients'

 

The method of Sanger sequencing is as follows:

 

  1. The above ingredients are combined in a tube.

 

  1. The mixture is heated to separate the two strands of the template DNA, then cooled again so the DNA primer binds to the template DNA

 

  1. DNA polymerase begins to synthesise DNA, starting from the primer
    • New dNTPs are added to the chain by DNA polymerase, following the specific nucleotide pairings found in DNA (A with T, and C with G). This continues until a ddNTP is added. These nucleotides are ‘chain-terminating’, meaning once they have been added no further nucleotides can be added to that DNA strand.

 

  1. The process is repeated for many cycles to make DNA strands of many different lengths.

 

By the time the reaction is complete, it is virtually guaranteed that a ddNTP will have been added at every single position of the target DNA in at least one of the DNA strands that has been created.

The tube now contains DNA strands of different lengths, each ending in a fluorescently labelled ddNTP.

   

  1. The resulting DNA strands are run through a long thin tube containing a matrix made of gel, in a process known as capillary gel electrophoresis.

    • Short fragments move more quickly through the pores of the gel than longer fragments.

    • The smallest fragment (one nucleotide after the primer) crosses the line first, then the next smallest (two nucleotides after the primer) and so on up to the full sequence length.

 

  1. As each fragment crosses the ‘finish line’ at the end of the tube, it is illuminated by a laser which detects the fluorescent dye attached to the ddNTP at the end of the strand.

 

  1. A detector records the order of the fluorescent colours, which reveals the order of the DNA sequence.

Sanger sequencing - strand separation and primer binding
Sanger sequencing - chain termination reactions
Sanger sequencing - capillary gel electrophoresis

 

Sanger sequencing - sequence detection

 

 

 


1. CD Genomics, 2020

Read another post: