DNA Sequencing Overview

Mino
August 16, 2024
knowledge_camp

An Introduction to DNA Sequencing

DNA Sequencing
DNA Sequencing

DNA sequencing is the process of determining the sequence of nucleic acids in DNA, specifically the order of the nucleotides: adenine, guanine, cytosine, and thymine. This foundational technology in biological and medical sciences has catalyzed significant research and discovery advancements. Read about the dark DNA phenomenon and its role in medical advancements at PubMed.

Importance of DNA Sequencing

DNA sequencing is crucial across various scientific fields, including medical diagnosis, biotechnology, and forensic biology. It enables researchers to compare genetic sequences, aiding in the diagnosis of diseases like cancer. Cancer Research It also helps in characterizing antibody repertoires Antibody Research and guiding patient treatment strategies. Read Patient Care Strategies.

The rapid evolution of sequencing technology has led to significant achievements, including the sequencing of entire genomes of various species, which enhances our understanding of biological diversity and evolutionary histories.

Applications of DNA Sequencing

Molecular Biology

In molecular biology, sequencing is a tool for studying genomic structures and identifying genomic changes linked to different diseases, thereby aiding in the discovery of new drug targets. Explore more here.

Evolutionary Biology

DNA sequencing provides insights into evolutionary processes, showing relationships among species. Notably, DNA from a million-year-old mammoth was sequenced, setting a record for the oldest DNA sequenced to date. CNN Report PubMed.

Metagenomics

This field focuses on studying environmental samples to identify the diversity of microbial life, crucial for ecological and microbiological research. Introduction to Metagenomics

Virology

Sequencing is vital in virology for identifying viruses and studying their evolution and epidemiology, especially in managing outbreaks and developing vaccines. Viral Genome Sequencing

Medicine

DNA sequencing facilitates genetic testing, enabling personalized medicine by tailoring medical treatment based on individual genetic profiles. It plays a critical role in diagnosing rare genetic disorders and in reproductive counseling.

DNA sequencing is also crucial for identifying pathogens, improving the management of antibiotic resistance, and ensuring precise treatments.

Forensic investigation

Main article: Forensic DNA analysis

DNA sequencing is vital in forensic science for DNA profiling and paternity testing, significantly advancing the ability to match DNA to individuals involved in legal and criminal cases.

The four canonical bases

Main article: Nucleotide

DNA's structure is primarily composed of four bases: adenine (A), guanine (G), cytosine (C), and thymine (T). Variations in these bases, such as methylated forms like 5mC (5-methylcytosine), are crucial for genetic regulation and are studied through DNA sequencing.

Viruses and other organisms may use modified bases, which can affect the detection and analysis in sequencing studies.

History

Discovery of DNA structure and function

DNA was first isolated by Friedrich Miescher in 1869, but its role in heredity was not established until the mid-20th century through the experiments of Avery, MacLeod, and McCarty, and the double-helix model proposed by Watson and Crick in 1953. Discovery of DNA

Frederick Sanger further advanced biological science by developing the first methods of DNA sequencing, contributing to our understanding of DNA's role in protein synthesis and genetic expression. His foundational work in the mid-20th century set the stage for modern genetic research.

RNA sequencing

Pioneered by Walter Fiers and his team in the early 1970s, RNA sequencing was crucial for understanding RNA's role in protein synthesis and gene regulation, marking significant progress in molecular biology.

Early DNA sequencing methods

Developed in the early 1970s, the first DNA sequencing methods, like those pioneered by Ray Wu and later refined by Sanger and Gilbert, were crucial for the rapid advancement of genetic research and biotechnology. Early DNA Sequencing

High-throughput sequencing (HTS) methods

From the late 1990s, high-throughput sequencing technologies transformed genetic research, enabling rapid sequencing of entire genomes and massively parallel processing of genetic data. These advancements have led to significant reductions in the cost and time required for genetic research, impacting various fields including medicine, anthropology, and personal genomics.

Basic methods

Maxam-Gilbert sequencing

Main article: Maxam-Gilbert sequencing

Developed by Allan Maxam and Walter Gilbert in 1977, this method involves chemical modification of DNA and subsequent cleavage at specific bases. Known as chemical sequencing, it uses radioactive labeling and requires detailed handling, which limited its use after the development of simpler methods like Sanger sequencing. Maxam-Gilbert sequencing creates specific breaks in DNA to produce a series of labeled fragments analyzed through electrophoresis and autoradiography, revealing the DNA sequence from the pattern of fragments.

Chain-termination methods

Main article: Sanger sequencing

The chain-termination method, developed by Frederick Sanger and his team in 1977, quickly became the standard due to its ease and reliability. It involves terminating DNA synthesis at specific nucleotides using dideoxynucleotides, which allows for the sequential building and analysis of DNA strands. This method was foundational in sequencing the human genome and has undergone significant advancements like fluorescent labeling and automation, greatly reducing the cost and complexity of DNA sequencing.

Sequencing by synthesis (SBS)

Sequencing by synthesis involves monitoring the incorporation of nucleotides into a DNA strand by a DNA polymerase to determine the sequence. Initially described in 1993, this method involves several key steps: DNA amplification, attachment to a solid support, synthesis by polymerase, and real-time detection of nucleotide incorporation. Improvements in SBS have allowed it to be a cornerstone of high-throughput sequencing technologies used in modern genomics.

Large-scale sequencing and de novo sequencing

Large-scale sequencing involves breaking genomic DNA into smaller, manageable pieces, cloning them into a bacterial library, and sequencing the individual clones. This approach is crucial for sequencing complex genomes, such as those of entire chromosomes, by using overlapping DNA regions to assemble the sequence.

Methods and Techniques:

  • Fragmentation: DNA is either cut with restriction enzymes or sheared mechanically.
  • Cloning: The fragmented DNA is cloned into vectors and amplified in bacteria like Escherichia coli.
  • Sequencing: Individual bacterial clones are sequenced, and the DNA sequences are assembled using overlaps.
  • Size Selection: Implementing a size selection step during cloning can enhance the efficiency and accuracy of genome assembly.

De novo sequencing

De novo sequencing refers to sequencing a novel genome without reference to previously known DNA sequences, starting "from the beginning." It is essential for discovering new genetic information where no reference genomes are available.

Challenges:

  • Complex Assembly: The assembly of de novo sequences can be complex and error-prone, especially in genomes with repetitive sequences.
  • Gaps in Sequence: These are often filled using techniques like primer walking to complete the sequence assembly.

Emulsion PCR

A key technique in de novo sequencing, emulsion PCR isolates individual DNA molecules in microdroplets, allowing for the amplification of DNA fragments without interference from others. This technique supports various sequencing platforms, including those developed by 454 Life Sciences and SOLiD systems.

Shotgun sequencing

Main article: Shotgun sequencing

Shotgun sequencing is designed to handle DNA sequences longer than 1000 base pairs, up to complete chromosomes. This method involves random fragmentation of DNA, which is then sequenced in pieces. The resulting sequences are assembled based on overlapping regions, allowing for the reconstruction of the original DNA sequence.

High-throughput sequencing methods

High-throughput sequencing (HTS) technologies allow for the parallel sequencing of millions of DNA fragments. These technologies have transformed genetic analysis, enabling rapid sequencing of entire genomes.

Applications:

  • Exome and Genome Sequencing: For comprehensive analysis of genetic variations.
  • Transcriptome Profiling (RNA-Seq): To study gene expression.
  • DNA-Protein Interactions (ChIP-Sequencing): For identifying binding sites.
  • Epigenome Characterization: To study DNA modifications.

Advancements:

  • HTS technologies significantly reduce the cost and increase the speed of DNA sequencing, making it feasible to sequence large genomes quickly.
  • Developments in sequencing technologies have led to a rapid decrease in the cost of genome sequencing, with costs continuing to drop as technologies improve.

Comparison of High-Throughput Sequencing Methods

MethodRead lengthAccuracyReads per runTime per runCost per 1 billion basesAdvantagesDisadvantages
Single-molecule real-time (Pacific Biosciences)
up to 100,000 bp
87%
4 million
20 hours
$43.3
High accuracy, long reads
High cost, moderate throughput
Ion semiconductor (Ion Torrent)
up to 600 bp
99.6%
80 million
2 hours
$950
Fast, cost-effective
Prone to errors in homopolymer regions
Pyrosequencing (454)
700 bp
99.9%
1 million
24 hours
$10,000
Long reads, fast
Costly per run, errors in homopolymer regions
Illumina (Sequencing by synthesis)
up to 600 bp
99.9%
up to 3 billion
11 days
$150
High throughput, scalable
Requires high DNA concentration
Nanopore Sequencing
Variable
92-97%
User-defined
Real-time
$100
Longest reads, real-time data
Lower accuracy, lower throughput
Chain termination (Sanger sequencing)
up to 900 bp
99.9%
N/A
3 hours
$2,400,000
Highly accurate
Impractical for large projects

High-throughput sequencing technologies continue to evolve, offering diverse options for genetic analysis, from basic research to clinical diagnostics.

Summary of Long-Read and Short-Read Sequencing Methods

Long-Read Sequencing Methods

Single Molecule Real Time (SMRT) Sequencing: Developed by Pacific Biosciences, SMRT sequencing utilizes zero-mode wave-guides (ZMWs) to monitor DNA synthesis. It features the use of unmodified polymerase and fluorescently labeled nucleotides, enabling the detection of nucleotide modifications such as cytosine methylation. This technique provides long reads, averaging 5 kilobases, with some reaching up to 20,000 nucleotides. The Sequel System, an advancement over the PacBio RS II, significantly increases the number of ZMWs to 1 million.

Nanopore DNA Sequencing: In nanopore sequencing, DNA molecules pass through a nanopore, disrupting ion flow which is then measured to identify the sequence. This method captures the sequence of unmodified DNA strands in real time. It has been developed using both biological and solid-state nanopores for enhanced accuracy and throughput. herehere, and here.

Short-Read Sequencing Methods

Massively Parallel Signature Sequencing (MPSS): A bead-based method developed by Lynx Therapeutics, MPSS was an early high-throughput sequencing technology but became obsolete with the advent of simpler sequencing-by-synthesis technologies like those developed by Illumina here.

Polony Sequencing: Developed in George M. Church's lab, this method combines emulsion PCR and ligation-based sequencing chemistry, achieving high accuracy and cost-efficiency. It was foundational for the technologies that evolved into the SOLiD sequencing system.

454 Pyrosequencing: This method, developed by 454 Life Sciences, uses emulsion PCR and pyrosequencing within wells containing picoliter volumes, allowing intermediate read length and cost. here and here.

Illumina (Solexa) Sequencing: Illumina's technology, which originated from Solexa, uses reversible dye-terminators for sequencing. The method involves amplifying DNA on a surface to form local clonal DNA colonies, known as "DNA clusters," which are sequenced in cycles. More.

Combinatorial Probe Anchor Synthesis (cPAS): An evolution of cPAL, developed by Complete Genomics and BGI, cPAS allows for high-throughput sequencing with full-length read capability and is integrated into platforms like the MGISEQ-2000RS. herehere.

SOLiD Sequencing: Using a two-base encoding scheme, SOLiD sequencing involves sequencing by ligation, which offers high accuracy and throughput but can be complex when decoding. herehere.

Ion Torrent Semiconductor Sequencing: This method detects hydrogen ions released during DNA polymerization, avoiding the use of optical detection. It's suitable for applications requiring fast

Summary of DNA Sequencing Methods in Development

Nanopore and Microscopy-Based Sequencing

Developing methods include advanced nanopore technologies and microscopy-based techniques using atomic force or transmission electron microscopy to detect nucleotides labeled with heavier elements in long DNA fragments. These methods aim to increase throughput, reduce costs, and eliminate the need for excessive reagents.

Tunnelling Currents DNA Sequencing

This technique uses electrical tunnelling currents to read DNA sequences as strands transit through a channel. It offers a potential for much faster sequencing than ionic current methods. More.

Sequencing by Hybridization

A non-enzymatic approach that uses DNA microarrays to detect sequences through hybridization of labeled DNA to known sequences, allowing for large-scale, efficient coverage. More.

Sequencing with Mass Spectrometry

Mass spectrometry, particularly MALDI-TOF MS, is explored as an alternative to gel electrophoresis, providing high-resolution analysis of DNA fragments and potential applications in forensic science.

Microfluidic Sanger Sequencing

Microfluidic technology integrates DNA amplification and separation on a single chip, reducing reagent usage and costs while increasing throughput.

Microscopy-Based Techniques

Techniques that directly visualize DNA sequences using electron microscopy to identify individually labeled bases within large DNA molecules.

RNAP Sequencing

This method utilizes RNA polymerase attached to beads and measures changes in distance between the beads during transcription to sequence DNA.

In Vitro Virus High-Throughput Sequencing

A novel method combining 454 pyrosequencing with an in vitro virus mRNA display technique to analyze protein interactions and sequence mRNA linked to proteins of interest.

Overview of DNA Sequencing Market and Sample Preparation

Market Dominance

In 2022, Illumina controlled approximately 80% of the DNA sequencing market, with the remainder shared among a few key players.

Sample Preparation for DNA Sequencing

Successful DNA sequencing depends heavily on the quality of sample preparation:

  • DNA Extraction: Aims to yield long, non-degraded DNA strands.
  • RNA Extraction: Should result in RNA that is converted to cDNA using reverse transcriptase for subsequent sequencing149.

Post-extraction, samples often need additional preparation specific to the sequencing method, such as library preparation for next-generation sequencing. Quality and quantity assessments are crucial to identify degraded or low-purity samples to ensure high-quality data.

Automation in Sample Preparation

The need for high-throughput sample preparation has led to the development and use of various liquid handling instruments:

CompanyLiquid handlers / AutomationLower Mark($)Upper Mark ($)
OpenTrons OT-2
$6,500
$20,000
Gilson Pipetmax
$20,000
$40,000
Neotec
Neotec EzMate
$25,000
$45,000
Formulatrix Mantis
$40,000
$60,000
Hudson Robotics SOLO
$40,000
$50,000
Hamilton Microlab NIMBUS
$40,000
$80,000
TTP Labtech Mosquito HV Genomics
$45,000
$80,000
Biomek 4000
$50,000
$65,000
Hamilton Genomic STARlet
$50,000
$100,000
Eppendorf epMotion 5075t
$95,000
$110,000
Beckman Coulter Biomek i5
$100,000
$150,000
Hamilton NGS STAR
$100,000
$200,000
PerkinElmer
PerkinElmer Sciclone G3 NGS and NGSx Workstation
$150,000
$220,000

DNA Sequencing Development Initiatives and Ethical Concerns

Development Initiatives

X Prize for Genome Sequencing: In October 2006, the X Prize Foundation launched the Archon X Prize to encourage advancements in genome sequencing technologies. The challenge was to sequence 100 human genomes within 10 days, achieving an accuracy of no more than one error per 100,000 bases, covering at least 98% of the genome, and at a cost of no more than $10,000 per genome.

NHGRI Funding: The National Human Genome Research Institute (NHGRI) awards grants for innovations in genomics. Recent focuses include developments in microfluidic, polony, and base-heavy sequencing methodologies.

Computational Challenges

Sequencing technologies generate raw data that require complex computational processes to assemble into complete genomes. Challenges include error evaluation and handling repetitive sequences, which complicate genome assemblies. Programs like Phred and Phrap are essential for these tasks. [154]

Read Trimming

To improve the quality of data used in genomic analyses, several read trimming algorithms have been developed:

Name of algorithmType of algorithmLink
Cutadapt
Running sum
ConDeTri
Window based
ERNE-FILTER
Running sum
FASTX quality trimmer
Window based
PRINSEQ
Window based
Trimmomatic
Window based
SolexaQA
Window based
SolexaQA-BWA
Running sum
Sickle
Window based

Ethical Issues in Genomics

Bioethics and DNA Ownership: The use of DNA sequencing raises significant ethical questions about the ownership of an individual's DNA and the data derived from it. Key legal cases like Moore v. Regents of the University of California highlight these issues, emphasizing the need for informed consent in the use of biological samples [here] (https://api.semanticscholar.org/CorpusID:15357657).

Privacy and Discrimination Concerns: The potential misuse of genomic data by insurers or employers is a major concern. Laws like the Genetic Information Nondiscrimination Act (GINA) in the U.S. aim to prevent discrimination based on genetic data, but challenges remain, particularly with the security and privacy of genomic dat here[166]

DNA Collection and Use: In many places, DNA considered "abandoned" can legally be collected and sequenced without consent, raising ethical questions about privacy and personal rights.

Screening and Anxiety: Genetic screening can potentially lead to anxiety among individuals identified as having an increased risk of certain diseases. Despite this, studies show that such screening does not necessarily result in heightened anxiety. herehere, and here. Ethical discussions also extend to the application of new sequencing technologies, which can deepen these privacy and ethical challenges.

Ethical Considerations in Genetic Testing

Increased Use of Genetic Screening: As genetic screening becomes more common, both for newborns and adults through services like 23andMe, ethical considerations regarding consent, data handling, and the psychological impact of genetic information are crucial. Screening might reveal sensitive information not just about the individual but potentially about their relatives, intensifying the debate over privacy and consent in genetic testing here and here.

Impact of Next-Generation Sequencing: The advent of Next-Generation sequencing technologies like Nanopore sequencing brings additional layers of ethical considerations due to their ability to rapidly and inexpensively sequence entire genomes. These technologies could lead to broader usage and data collection, potentially exacerbating issues around consent and data privacy.