Next Generation Sequencing

Comparison of Sanger Sequencing and Next Generation Sequencing. [1]

The principle of Next Generation Sequencing (NGS) is similar to that of Sanger sequencing, which relies on capillary electrophoresis.
The genomic strand is fragmented, and the bases in each fragment are identified by emitted signals when the fragments are ligated against a template strand. The Sanger method required separate steps for sequencing, separation (by electrophoresis) and detection, which made it difficult to automate the sample preparation and it was limited in throughput, scalability and resolution. The NGS method uses array-based sequencing which combines the techniques developed in Sanger sequencing to process millions of reactions in parallel, resulting in very high speed and throughput at a reduced cost. The genome sequencing projects that took many years with Sanger methods can now be completed in hours with NGS, although with shorter read lengths (the number of bases that are sequenced at a time) and less accuracy.

Workflow for next-generation sequence experiments. [2]

Cancer genome sequencing analysis is a multi-step process.

1. Library Preparation & Sequencing

Attach the adapter to the DNA fragments extracted from the sample and amplify the amount of each DNA fragment(read sequences).
The read sequences along with the Phred-like quality scores are stored in a FASTQ file, which is a de facto standard for representing biological sequence information.

2. Sequence Alignment

Sequencing reads aligned onto the reference genome using fast aligner such as BWA (Burrows-Wheeler aligner) and are further processed with local realignment and score recalibration. The resulting BAM files can be analyzed with various softwares/algorithms to screening the corresponding genomic aberrations.

3. Variant Calling

Variant calling is the process of accurately identifying the differences or variations between the sample and the reference genome sequence. The typical input is a set of aligned reads in BAM.
The variant call format (VCF) is a file format for storing DNA variation data such as single nucleotide variants(SNVs; also called single nucleotide polymorphisms or SNPs), insertions/deletions(indels), copy number alterations, and large structural alterations (insertions, inversions, and translocations).


Reference

[1] Nature Biotechnology 26, Pages 1135-1145 (2008), Next-generation DNA sequencing.
[2] Bioinformatics and Functional Genomics(3rd Ed).