期刊论文详细信息
Frontiers in Genetics
UMIc: A Preprocessing Method for UMI Deduplication and Reads Correction
Ilias Kappas1  Nikolaos Pechlivanis1  Tobias Hutzenlaub2  Fotis Psomopoulos3  Anastasia Chatzidimitriou3  Maria Christina Maniou3  Anastasis Togkousidis3  Maria Tsagiopoulou3  Michaela Kotrová5 
[1] Department of Genetics, Development and Molecular Biology, School of Biology, Aristotle University of Thessaloniki, Thessaloniki, Greece;Hahn-Schickard, Freiburg, Germany;Institute of Applied Biosciences, Centre for Research and Technology Hellas, Thessaloniki, Greece;Laboratory for MEMS Applications, IMTEK-Department of Microsystems Engineering, University of Freiburg, Freiburg, Germany;Unit for Hematological Diagnostics, Department of Internal Medicine II, University Medical Center Schleswig-Holstein, Kiel, Germany;
关键词: unique molecular identifiers;    molecular barcodes;    error correction;    next-generation sequencing;    bioinformatics;   
DOI  :  10.3389/fgene.2021.660366
来源: DOAJ
【 摘 要 】

A recent refinement in high-throughput sequencing involves the incorporation of unique molecular identifiers (UMIs), which are random oligonucleotide barcodes, on the library preparation steps. A UMI adds a unique identity to different DNA/RNA input molecules through polymerase chain reaction (PCR) amplification, thus reducing bias of this step. Here, we propose an alignment free framework serving as a preprocessing step of fastq files, called UMIc, for deduplication and correction of reads building consensus sequences from each UMI. Our approach takes into account the frequency and the Phred quality of nucleotides and the distances between the UMIs and the actual sequences. We have tested the tool using different scenarios of UMI-tagged library data, having in mind the aspect of a wide application. UMIc is an open-source tool implemented in R and is freely available from https://github.com/BiodataAnalysisGroup/UMIc.

【 授权许可】

Unknown   

  文献评价指标  
  下载次数:0次 浏览次数:6次