期刊论文详细信息
BMC Bioinformatics
cDNA-detector: detection and removal of cDNA contamination in DNA sequencing libraries
Meifang Qi1  Esther Rheinbay2  Leif S. Ludwig3  Nikhil Wagle4  Utthara Nayar5 
[1] Center for Cancer Research, Massachusetts General Hospital, 02129, Charlestown, MA, USA;Harvard Medical School, 02115, Boston, MA, USA;Broad Institute of MIT and Harvard, 02142, Cambridge, MA, USA;Center for Cancer Research, Massachusetts General Hospital, 02129, Charlestown, MA, USA;Harvard Medical School, 02115, Boston, MA, USA;Broad Institute of MIT and Harvard, 02142, Cambridge, MA, USA;Department of Pathology, Massachusetts General Hospital, 02114, Boston, MA, USA;Harvard Medical School, 02115, Boston, MA, USA;Broad Institute of MIT and Harvard, 02142, Cambridge, MA, USA;Berlin Institute of Health at Charité – Universitätsmedizin Berlin, 10117, Berlin, Germany;Max‐Delbrück‐Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin Institute for Medical Systems Biology (BIMSB), 10115, Berlin, Germany;Harvard Medical School, 02115, Boston, MA, USA;Broad Institute of MIT and Harvard, 02142, Cambridge, MA, USA;Department of Medical Oncology, Dana-Farber Cancer Institute, 02215, Boston, MA, USA;Harvard Medical School, 02115, Boston, MA, USA;Broad Institute of MIT and Harvard, 02142, Cambridge, MA, USA;Department of Medical Oncology, Dana-Farber Cancer Institute, 02215, Boston, MA, USA;Department of Biochemistry and Molecular Biology, Bloomberg School of Public Health, The Johns Hopkins University, Baltimore, MD, USA;Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD, USA;
关键词: Contamination;    Genomics;    Software;    Quality control;    cDNA;   
DOI  :  10.1186/s12859-021-04529-2
来源: Springer
PDF
【 摘 要 】

BackgroundExogenous cDNA introduced into an experimental system, either intentionally or accidentally, can appear as added read coverage over that gene in next-generation sequencing libraries derived from this system. If not properly recognized and managed, this cross-contamination with exogenous signal can lead to incorrect interpretation of research results. Yet, this problem is not routinely addressed in current sequence processing pipelines.ResultsWe present cDNA-detector, a computational tool to identify and remove exogenous cDNA contamination in DNA sequencing experiments. We demonstrate that cDNA-detector can identify cDNAs quickly and accurately from alignment files. A source inference step attempts to separate endogenous cDNAs (retrocopied genes) from potential cloned, exogenous cDNAs. cDNA-detector provides a mechanism to decontaminate the alignment from detected cDNAs. Simulation studies show that cDNA-detector is highly sensitive and specific, outperforming existing tools. We apply cDNA-detector to several highly-cited public databases (TCGA, ENCODE, NCBI SRA) and show that contaminant genes appear in sequencing experiments where they lead to incorrect coverage peak calls.ConclusionscDNA-detector is a user-friendly and accurate tool to detect and remove cDNA detection in NGS libraries. This two-step design reduces the risk of true variant removal since it allows for manual review of candidates. We find that contamination with intentionally and accidentally introduced cDNAs is an underappreciated problem even in widely-used consortium datasets, where it can lead to spurious results. Our findings highlight the importance of sensitive detection and removal of contaminant cDNA from NGS libraries before downstream analysis.

【 授权许可】

CC BY   

【 预 览 】
附件列表
Files Size Format View
RO202203047995343ZK.pdf 2092KB PDF download
  文献评价指标  
  下载次数:5次 浏览次数:0次