期刊论文详细信息
BMC Genomics
dnAQET: a framework to compute a consolidated metric for benchmarking quality of de novo assemblies
  1    1    2 
[1] 0000 0001 2243 3366, grid.417587.8, Bioinformatics Branch, Division of Bioinformatics and Biostatistics, National Center for Toxicological Center, U.S Food and Drug Administration, 72079, Jefferson, AR, USA;0000 0001 2243 3366, grid.417587.8, Bioinformatics Branch, Division of Bioinformatics and Biostatistics, National Center for Toxicological Center, U.S Food and Drug Administration, 72079, Jefferson, AR, USA;0000 0001 2243 3366, grid.417587.8, Present Address: Molecular Pathology Cytology Branch, Division of Molecular Genetics and Pathology, Office of In Vitro Diagnostics and Radiological Health, the Center for Devices and Radiological Health, U.S Food and Drug Administration, 20993, Silver Spring, MD, USA;
关键词: de novo genome assembly;    Assembly quality assessment;    Next generation sequencing;    Misassembly;   
DOI  :  10.1186/s12864-019-6070-x
来源: publisher
PDF
【 摘 要 】

BackgroundAccurate de novo genome assembly has become reality with the advancements in sequencing technology. With the ever-increasing number of de novo genome assembly tools, assessing the quality of assemblies has become of great importance in genome research. Although many quality metrics have been proposed and software tools for calculating those metrics have been developed, the existing tools do not produce a unified measure to reflect the overall quality of an assembly.ResultsTo address this issue, we developed the de novo Assembly Quality Evaluation Tool (dnAQET) that generates a unified metric for benchmarking the quality assessment of assemblies. Our framework first calculates individual quality scores for the scaffolds/contigs of an assembly by aligning them to a reference genome. Next, it computes a quality score for the assembly using its overall reference genome coverage, the quality score distribution of its scaffolds and the redundancy identified in it. Using synthetic assemblies randomly generated from the latest human genome build, various builds of the reference genomes for five organisms and six de novo assemblies for sample NA24385, we tested dnAQET to assess its capability for benchmarking quality evaluation of genome assemblies. For synthetic data, our quality score increased with decreasing number of misassemblies and redundancy and increasing average contig length and coverage, as expected. For genome builds, dnAQET quality score calculated for a more recent reference genome was better than the score for an older version. To compare with some of the most frequently used measures, 13 other quality measures were calculated. The quality score from dnAQET was found to be better than all other measures in terms of consistency with the known quality of the reference genomes, indicating that dnAQET is reliable for benchmarking quality assessment of de novo genome assemblies.ConclusionsThe dnAQET is a scalable framework designed to evaluate a de novo genome assembly based on the aggregated quality of its scaffolds (or contigs). Our results demonstrated that dnAQET quality score is reliable for benchmarking quality assessment of genome assemblies. The dnQAET can help researchers to identify the most suitable assembly tools and to select high quality assemblies generated.

【 授权许可】

CC BY   

【 预 览 】
附件列表
Files Size Format View
RO201910102395803ZK.pdf 1921KB PDF download
  文献评价指标  
  下载次数:11次 浏览次数:3次