Explained | A CODEC moment to accurately sequence genetic variants

This is the first edition of a new fortnightly column by Dr. Vinod Scaria and others, exploring contemporary concepts and issues in genetics.

The accurate identification of genetic mutations has significant applications in clinical diagnosis and prognosis. This includes early identification of emerging mutations in a small subset of cells, like cancers, emerging drug resistant mutations, and many more. In fact, the accurate detection of genetic variations is the cornerstone of genomics and its myriad applications.

This typically becomes tricky in current sequencing approaches since they have an inherent error rate, and therefore differentiating errors from real mutations becomes difficult. Additionally, the present sequencing approaches typically sequence one of the two strands of DNA at a time, and often in short fragments, making it difficult to accurately decipher the sequence of both strands.

Errors could be inherent to the sample, for example in the form of DNA damage, or could be introduced by specific molecular techniques used to process samples or to amplify them before sequencing. Researchers and clinicians currently work around this challenge by sequencing more, which becomes expensive, or by using specific molecular shortcuts, but which become cumbersome to implement. These molecular shortcuts include methods like introducing specific barcodes for each strand, strand specific amplification, and so on.

However, at the time they were developed, these approaches were not simple and scalable enough for wide applications, especially in clinical settings.

Meet CODEC

Researchers at the Broad Institute at the Massachusetts Institute of Technology and at Harvard University have developed a new method that significantly improves the accuracy of DNA sequencing using a molecular trick. They call it CODEC – short for ‘Concatenating Original Duplex for Error Correction’. In brief, this approach involves physically linking the two strands of DNA, named ‘Watson’ and ‘Crick’ after the two people who helped discover the structure of the DNA double helix.

Once the two strands are linked, further the DNA is sequenced to differentiate between the real mutations and the errors introduced by DNA damage or by the amplification process. Since the information in the two strands is physically linked to a single strand, the authors also call their approach ‘single duplex sequencing’.

By comparing the two copies of each molecule, researchers can identify and filter out errors that would otherwise have gone unnoticed or could be introduced either by erroneous sequencing, or by DNA damage. This process dramatically improves the accuracy of the sequencing results and reduces the likelihood of false positives as well as false negatives.

Decoding diseases

The molecular linking process in CODEC uses specific molecular barcodes, which are known DNA fragments that can be added to the specific sequences and retrieve them based on the identity. CODEC is compatible with the popular Illumina sequencing technology. It is also compatible with sequencing a genome in part or in full (typically called exome sequencing). The beauty of the method is that it does not require a significant change in existing sequencing protocols.

The researchers behind the project believe that the duplex sequencing method could become the new gold-standard for DNA sequencing in the future. Their approach improves the accuracy of sequencing by up to a factor of a thousand, with a possible reduction in the amount of sequencing by a factor of 100 relative to current methods required to attain a similar accuracy.

Accurately detecting genetic mutations has several applications. For example, it can be used to detect cancer early, such as in the blood, using a method called a liquid biopsy, or to monitor the small number of cancer cells that are left behind in the body (a.k.a. minimal residual disease).

In their lifetimes, as their cells divide, organisms inherently acquire genetic mutations, albeit at a constant rate. In some cases, these mutations could affect health and disease, such as when they involve blood stem cells that result in the proliferation of clones and sometimes cancer. Accurate genomic approaches could help identify such events much earlier compared to conventional sequencing methods.

Diverse applications

This method could also help detect rare genetic events – such as in sperms that could result in genetic diseases, and can therefore be applied to screening donors. They can also help to identify the early emergence of drug-resistance in microbes.

Indeed, stepping beyond human health, the potential applications of duplex sequencing span a wide variety of areas, including agriculture, environmental monitoring, and forensics – practically any field in which the estimation of genetic variants using DNA sequencing is critical. It can also be used to differentiate damaged DNA from real genetic mutations where DNA damage is inherent and unavoidable.

Overall, the development of the duplex sequencing method represents a significant step forward in the field of genetic research. It is likely that it could emerge as a gold-standard, as the method’s creators have hoped, for accurately studying genetic mutations. By improving the accuracy of DNA sequencing, scientists can better understand the underlying causes of diseases and develop more effective treatments. Additionally, the method has practical applications in many fields.

Sridhar Sivasubbu and Vinod Scaria are scientists at the CSIR Institute of Genomics and Integrative Biology (CSIR-IGIB). The opinions expressed here are personal.

  • The accurate detection of genetic variations is the cornerstone of genomics and its myriad applications. This typically becomes tricky in current sequencing approaches since they have an inherent error rate, and therefore differentiating errors from real mutations becomes difficult
  • Researchers at the Broad Institute at the Massachusetts Institute of Technology and at Harvard University have developed a new method that significantly improves the accuracy of DNA sequencing using a molecular trick. They call it CODEC – short for ‘Concatenating Original Duplex for Error Correction’. 
  • This approach involves physically linking the two strands of DNA, named ‘Watson’ and ‘Crick’ after the two people who helped discover the structure of the DNA double helix.

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button