An Introduction to DNA Sequencing
DNA sequencing is the process of determining the sequence of nucleic acids in DNA, specifically the order of the nucleotides: adenine, guanine, cytosine, and thymine. This foundational technology in biological and medical sciences has catalyzed significant research and discovery advancements. Read about the dark DNA phenomenon and its role in medical advancements at PubMed.
DNA sequencing is crucial across various scientific fields, including medical diagnosis, biotechnology, and forensic biology. It enables researchers to compare genetic sequences, aiding in the diagnosis of diseases like cancer. Cancer Research It also helps in characterizing antibody repertoires Antibody Research and guiding patient treatment strategies. Read Patient Care Strategies.
The rapid evolution of sequencing technology has led to significant achievements, including the sequencing of entire genomes of various species, which enhances our understanding of biological diversity and evolutionary histories.
In molecular biology, sequencing is a tool for studying genomic structures and identifying genomic changes linked to different diseases, thereby aiding in the discovery of new drug targets. Explore more here.
DNA sequencing provides insights into evolutionary processes, showing relationships among species. Notably, DNA from a million-year-old mammoth was sequenced, setting a record for the oldest DNA sequenced to date. CNN Report PubMed.
This field focuses on studying environmental samples to identify the diversity of microbial life, crucial for ecological and microbiological research. Introduction to Metagenomics
Sequencing is vital in virology for identifying viruses and studying their evolution and epidemiology, especially in managing outbreaks and developing vaccines. Viral Genome Sequencing
DNA sequencing facilitates genetic testing, enabling personalized medicine by tailoring medical treatment based on individual genetic profiles. It plays a critical role in diagnosing rare genetic disorders and in reproductive counseling.
DNA sequencing is also crucial for identifying pathogens, improving the management of antibiotic resistance, and ensuring precise treatments.
Main article: Forensic DNA analysis
DNA sequencing is vital in forensic science for DNA profiling and paternity testing, significantly advancing the ability to match DNA to individuals involved in legal and criminal cases.
Main article: Nucleotide
DNA's structure is primarily composed of four bases: adenine (A), guanine (G), cytosine (C), and thymine (T). Variations in these bases, such as methylated forms like 5mC (5-methylcytosine), are crucial for genetic regulation and are studied through DNA sequencing.
Viruses and other organisms may use modified bases, which can affect the detection and analysis in sequencing studies.
DNA was first isolated by Friedrich Miescher in 1869, but its role in heredity was not established until the mid-20th century through the experiments of Avery, MacLeod, and McCarty, and the double-helix model proposed by Watson and Crick in 1953. Discovery of DNA
Frederick Sanger further advanced biological science by developing the first methods of DNA sequencing, contributing to our understanding of DNA's role in protein synthesis and genetic expression. His foundational work in the mid-20th century set the stage for modern genetic research.
Pioneered by Walter Fiers and his team in the early 1970s, RNA sequencing was crucial for understanding RNA's role in protein synthesis and gene regulation, marking significant progress in molecular biology.
Developed in the early 1970s, the first DNA sequencing methods, like those pioneered by Ray Wu and later refined by Sanger and Gilbert, were crucial for the rapid advancement of genetic research and biotechnology. Early DNA Sequencing
From the late 1990s, high-throughput sequencing technologies transformed genetic research, enabling rapid sequencing of entire genomes and massively parallel processing of genetic data. These advancements have led to significant reductions in the cost and time required for genetic research, impacting various fields including medicine, anthropology, and personal genomics.
Main article: Maxam-Gilbert sequencing
Developed by Allan Maxam and Walter Gilbert in 1977, this method involves chemical modification of DNA and subsequent cleavage at specific bases. Known as chemical sequencing, it uses radioactive labeling and requires detailed handling, which limited its use after the development of simpler methods like Sanger sequencing. Maxam-Gilbert sequencing creates specific breaks in DNA to produce a series of labeled fragments analyzed through electrophoresis and autoradiography, revealing the DNA sequence from the pattern of fragments.
Main article: Sanger sequencing
The chain-termination method, developed by Frederick Sanger and his team in 1977, quickly became the standard due to its ease and reliability. It involves terminating DNA synthesis at specific nucleotides using dideoxynucleotides, which allows for the sequential building and analysis of DNA strands. This method was foundational in sequencing the human genome and has undergone significant advancements like fluorescent labeling and automation, greatly reducing the cost and complexity of DNA sequencing.
Sequencing by synthesis involves monitoring the incorporation of nucleotides into a DNA strand by a DNA polymerase to determine the sequence. Initially described in 1993, this method involves several key steps: DNA amplification, attachment to a solid support, synthesis by polymerase, and real-time detection of nucleotide incorporation. Improvements in SBS have allowed it to be a cornerstone of high-throughput sequencing technologies used in modern genomics.
Large-scale sequencing involves breaking genomic DNA into smaller, manageable pieces, cloning them into a bacterial library, and sequencing the individual clones. This approach is crucial for sequencing complex genomes, such as those of entire chromosomes, by using overlapping DNA regions to assemble the sequence.
Methods and Techniques:
De novo sequencing refers to sequencing a novel genome without reference to previously known DNA sequences, starting "from the beginning." It is essential for discovering new genetic information where no reference genomes are available.
Challenges:
A key technique in de novo sequencing, emulsion PCR isolates individual DNA molecules in microdroplets, allowing for the amplification of DNA fragments without interference from others. This technique supports various sequencing platforms, including those developed by 454 Life Sciences and SOLiD systems.
Main article: Shotgun sequencing
Shotgun sequencing is designed to handle DNA sequences longer than 1000 base pairs, up to complete chromosomes. This method involves random fragmentation of DNA, which is then sequenced in pieces. The resulting sequences are assembled based on overlapping regions, allowing for the reconstruction of the original DNA sequence.
High-throughput sequencing (HTS) technologies allow for the parallel sequencing of millions of DNA fragments. These technologies have transformed genetic analysis, enabling rapid sequencing of entire genomes.
Applications:
Advancements:
Method | Read length | Accuracy | Reads per run | Time per run | Cost per 1 billion bases | Advantages | Disadvantages |
---|---|---|---|---|---|---|---|
Single-molecule real-time (Pacific Biosciences) | up to 100,000 bp | 87% | 4 million | 20 hours | $43.3 | High accuracy, long reads | High cost, moderate throughput |
Ion semiconductor (Ion Torrent) | up to 600 bp | 99.6% | 80 million | 2 hours | $950 | Fast, cost-effective | Prone to errors in homopolymer regions |
Pyrosequencing (454) | 700 bp | 99.9% | 1 million | 24 hours | $10,000 | Long reads, fast | Costly per run, errors in homopolymer regions |
Illumina (Sequencing by synthesis) | up to 600 bp | 99.9% | up to 3 billion | 11 days | $150 | High throughput, scalable | Requires high DNA concentration |
Nanopore Sequencing | Variable | 92-97% | User-defined | Real-time | $100 | Longest reads, real-time data | Lower accuracy, lower throughput |
Chain termination (Sanger sequencing) | up to 900 bp | 99.9% | N/A | 3 hours | $2,400,000 | Highly accurate | Impractical for large projects |
High-throughput sequencing technologies continue to evolve, offering diverse options for genetic analysis, from basic research to clinical diagnostics.
Single Molecule Real Time (SMRT) Sequencing: Developed by Pacific Biosciences, SMRT sequencing utilizes zero-mode wave-guides (ZMWs) to monitor DNA synthesis. It features the use of unmodified polymerase and fluorescently labeled nucleotides, enabling the detection of nucleotide modifications such as cytosine methylation. This technique provides long reads, averaging 5 kilobases, with some reaching up to 20,000 nucleotides. The Sequel System, an advancement over the PacBio RS II, significantly increases the number of ZMWs to 1 million.
Nanopore DNA Sequencing: In nanopore sequencing, DNA molecules pass through a nanopore, disrupting ion flow which is then measured to identify the sequence. This method captures the sequence of unmodified DNA strands in real time. It has been developed using both biological and solid-state nanopores for enhanced accuracy and throughput. here, here, and here.
Massively Parallel Signature Sequencing (MPSS): A bead-based method developed by Lynx Therapeutics, MPSS was an early high-throughput sequencing technology but became obsolete with the advent of simpler sequencing-by-synthesis technologies like those developed by Illumina here.
Polony Sequencing: Developed in George M. Church's lab, this method combines emulsion PCR and ligation-based sequencing chemistry, achieving high accuracy and cost-efficiency. It was foundational for the technologies that evolved into the SOLiD sequencing system.
454 Pyrosequencing: This method, developed by 454 Life Sciences, uses emulsion PCR and pyrosequencing within wells containing picoliter volumes, allowing intermediate read length and cost. here and here.
Illumina (Solexa) Sequencing: Illumina's technology, which originated from Solexa, uses reversible dye-terminators for sequencing. The method involves amplifying DNA on a surface to form local clonal DNA colonies, known as "DNA clusters," which are sequenced in cycles. More.
Combinatorial Probe Anchor Synthesis (cPAS): An evolution of cPAL, developed by Complete Genomics and BGI, cPAS allows for high-throughput sequencing with full-length read capability and is integrated into platforms like the MGISEQ-2000RS. here, here.
SOLiD Sequencing: Using a two-base encoding scheme, SOLiD sequencing involves sequencing by ligation, which offers high accuracy and throughput but can be complex when decoding. here, here.
Ion Torrent Semiconductor Sequencing: This method detects hydrogen ions released during DNA polymerization, avoiding the use of optical detection. It's suitable for applications requiring fast
Developing methods include advanced nanopore technologies and microscopy-based techniques using atomic force or transmission electron microscopy to detect nucleotides labeled with heavier elements in long DNA fragments. These methods aim to increase throughput, reduce costs, and eliminate the need for excessive reagents.
This technique uses electrical tunnelling currents to read DNA sequences as strands transit through a channel. It offers a potential for much faster sequencing than ionic current methods. More.
A non-enzymatic approach that uses DNA microarrays to detect sequences through hybridization of labeled DNA to known sequences, allowing for large-scale, efficient coverage. More.
Mass spectrometry, particularly MALDI-TOF MS, is explored as an alternative to gel electrophoresis, providing high-resolution analysis of DNA fragments and potential applications in forensic science.
Microfluidic technology integrates DNA amplification and separation on a single chip, reducing reagent usage and costs while increasing throughput.
Techniques that directly visualize DNA sequences using electron microscopy to identify individually labeled bases within large DNA molecules.
This method utilizes RNA polymerase attached to beads and measures changes in distance between the beads during transcription to sequence DNA.
A novel method combining 454 pyrosequencing with an in vitro virus mRNA display technique to analyze protein interactions and sequence mRNA linked to proteins of interest.
In 2022, Illumina controlled approximately 80% of the DNA sequencing market, with the remainder shared among a few key players.
Successful DNA sequencing depends heavily on the quality of sample preparation:
Post-extraction, samples often need additional preparation specific to the sequencing method, such as library preparation for next-generation sequencing. Quality and quantity assessments are crucial to identify degraded or low-purity samples to ensure high-quality data.
The need for high-throughput sample preparation has led to the development and use of various liquid handling instruments:
Company | Liquid handlers / Automation | Lower Mark($) | Upper Mark ($) |
---|---|---|---|
OpenTrons OT-2 | $6,500 | $20,000 | |
Gilson Pipetmax | $20,000 | $40,000 | |
Neotec | Neotec EzMate | $25,000 | $45,000 |
Formulatrix Mantis | $40,000 | $60,000 | |
Hudson Robotics SOLO | $40,000 | $50,000 | |
Hamilton Microlab NIMBUS | $40,000 | $80,000 | |
TTP Labtech Mosquito HV Genomics | $45,000 | $80,000 | |
Biomek 4000 | $50,000 | $65,000 | |
Hamilton Genomic STARlet | $50,000 | $100,000 | |
Eppendorf epMotion 5075t | $95,000 | $110,000 | |
Beckman Coulter Biomek i5 | $100,000 | $150,000 | |
Hamilton NGS STAR | $100,000 | $200,000 | |
PerkinElmer | PerkinElmer Sciclone G3 NGS and NGSx Workstation | $150,000 | $220,000 |
X Prize for Genome Sequencing: In October 2006, the X Prize Foundation launched the Archon X Prize to encourage advancements in genome sequencing technologies. The challenge was to sequence 100 human genomes within 10 days, achieving an accuracy of no more than one error per 100,000 bases, covering at least 98% of the genome, and at a cost of no more than $10,000 per genome.
NHGRI Funding: The National Human Genome Research Institute (NHGRI) awards grants for innovations in genomics. Recent focuses include developments in microfluidic, polony, and base-heavy sequencing methodologies.
Sequencing technologies generate raw data that require complex computational processes to assemble into complete genomes. Challenges include error evaluation and handling repetitive sequences, which complicate genome assemblies. Programs like Phred and Phrap are essential for these tasks. [154]
To improve the quality of data used in genomic analyses, several read trimming algorithms have been developed:
Name of algorithm | Type of algorithm | Link |
---|---|---|
Cutadapt | Running sum | |
ConDeTri | Window based | |
ERNE-FILTER | Running sum | |
FASTX quality trimmer | Window based | |
PRINSEQ | Window based | |
Trimmomatic | Window based | |
SolexaQA | Window based | |
SolexaQA-BWA | Running sum | |
Sickle | Window based |
Bioethics and DNA Ownership: The use of DNA sequencing raises significant ethical questions about the ownership of an individual's DNA and the data derived from it. Key legal cases like Moore v. Regents of the University of California highlight these issues, emphasizing the need for informed consent in the use of biological samples [here] (https://api.semanticscholar.org/CorpusID:15357657).
Privacy and Discrimination Concerns: The potential misuse of genomic data by insurers or employers is a major concern. Laws like the Genetic Information Nondiscrimination Act (GINA) in the U.S. aim to prevent discrimination based on genetic data, but challenges remain, particularly with the security and privacy of genomic dat here, [166]
DNA Collection and Use: In many places, DNA considered "abandoned" can legally be collected and sequenced without consent, raising ethical questions about privacy and personal rights.
Screening and Anxiety: Genetic screening can potentially lead to anxiety among individuals identified as having an increased risk of certain diseases. Despite this, studies show that such screening does not necessarily result in heightened anxiety. here, here, and here. Ethical discussions also extend to the application of new sequencing technologies, which can deepen these privacy and ethical challenges.
Increased Use of Genetic Screening: As genetic screening becomes more common, both for newborns and adults through services like 23andMe, ethical considerations regarding consent, data handling, and the psychological impact of genetic information are crucial. Screening might reveal sensitive information not just about the individual but potentially about their relatives, intensifying the debate over privacy and consent in genetic testing here and here.
Impact of Next-Generation Sequencing: The advent of Next-Generation sequencing technologies like Nanopore sequencing brings additional layers of ethical considerations due to their ability to rapidly and inexpensively sequence entire genomes. These technologies could lead to broader usage and data collection, potentially exacerbating issues around consent and data privacy.