Next-generation sequencing (NGS) methods are used extensively to profile mutations present in diseased human tissues. These genomic approaches hold great promise for personalized medicine but sequencing accuracy is essential for proper patient diagnosis and determining a treatment plan. A common source of DNA for genomic profiling is formalin-fixed, paraffin-embedded (FFPE) tissue samples obtained from patient biopsy. FFPE DNA poses important challenges for preparing NGS libraries including low input amounts and poor DNA quality, resulting from extensive fixation- and storage-induced DNA damage. Additionally, these damage-induced sequencing artifacts raise the background level of mutations, making it difficult to discern true, low frequency, disease-causing variants from noise. We previously showed that a major fraction of somatic mutations described in publicly available datasets are due to such sequencing artifacts (Chen et al., Science 2017). Furthermore, we showed that enzymatic repair of DNA before library preparation improves the library quality and reduces background noise.
We developed a second-generation DNA repair enzyme mix (V2) that efficiently repairs the most prevalent damage types found in FFPE DNA and further improves the quality and yield of NGS libraries. Additionally, we tested the efficacy of the V2 repair mix in improving sequencing accuracy for FFPE DNA samples obtained from different cancer tissues. We performed target enrichment, deep sequenced, and performed variant analysis. For a subset of variants, we further validated our results using a droplet digital PCR (ddPCR) assay. Both methods showed that the V2 repair mix did not alter the overall frequency of variants identified, thus it did not introduce bias, but significantly improved the sequencing accuracy by reducing the number of false variant calls. Therefore, enzymatic repair is a critical first step in preparing FFPE DNA sequencing libraries, improving library quality and allowing more sensitive and robust detection of low frequency, disease variants.