期刊论文详细信息
BMC Genomics
HiMMe: using genetic patterns as a proxy for genome assembly reliability assessment
Methodology Article
Noushin Ghaffari1  Charles D. Johnson1  Aniruddha Datta2  Jordi Abante3 
[1] Center for Bioinformatics and Genomic Systems Engineering (CBGSE), 101 Gateway Blvd., College Station, TX, USA;AgriLife Genomics and Bioinformatics, Texas A&M AgriLife Research, 101 Gateway, Suite A, College Station, TX, USA;Center for Bioinformatics and Genomic Systems Engineering (CBGSE), 101 Gateway Blvd., College Station, TX, USA;Dwight Look College of Engineering, Texas A&M University, 400 Bizzell St, College Station, TX, USA;Whitaker Biomedical Engineering Institute, Johns Hopkins University, 3400 N Charles St, Baltimore, MD, USA;
关键词: Genome assemblies;    de novo;    Sequence analysis;    Hidden Markov models;    Markov chains;    Stochastic processes;    Supervised learning;   
DOI  :  10.1186/s12864-017-3965-2
 received in 2017-01-30, accepted in 2017-07-27,  发布年份 2017
来源: Springer
PDF
【 摘 要 】

BackgroundThe information content of genomes plays a crucial role in the existence and proper development of living organisms. Thus, tremendous effort has been dedicated to developing DNA sequencing technologies that provide a better understanding of the underlying mechanisms of cellular processes. Advances in the development of sequencing technology have made it possible to sequence genomes in a relatively fast and inexpensive way. However, as with any measurement technology, there is noise involved and this needs to be addressed to reach conclusions based on the resulting data. In addition, there are multiple intermediate steps and degrees of freedom when constructing genome assemblies that lead to ambiguous and inconsistent results among assemblers.MethodsHere we introduce HiMMe, an HMM-based tool that relies on genetic patterns to score genome assemblies. Through a Markov chain, the model is able to detect characteristic genetic patterns, while, by introducing emission probabilities, the noise involved in the process is taken into account. Prior knowledge can be used by training the model to fit a given organism or sequencing technology.ResultsOur results show that the method presented is able to recognize patterns even with relatively small k-mer size choices and limited computational resources.ConclusionsOur methodology provides an individual quality metric per contig in addition to an overall genome assembly score, with a time complexity well below that of an aligner. Ultimately, HiMMe provides meaningful statistical insights that can be leveraged by researchers to better select contigs and genome assemblies for downstream analysis.

【 授权许可】

CC BY   
© The Author(s) 2017

【 预 览 】
附件列表
Files Size Format View
RO202311094585974ZK.pdf 1018KB PDF download
12864_2017_4132_Article_IEq27.gif 1KB Image download
12864_2017_3670_Article_IEq16.gif 1KB Image download
12864_2016_3425_Article_IEq6.gif 1KB Image download
【 图 表 】

12864_2016_3425_Article_IEq6.gif

12864_2017_3670_Article_IEq16.gif

12864_2017_4132_Article_IEq27.gif

【 参考文献 】
  • [1]
  • [2]
  • [3]
  • [4]
  • [5]
  • [6]
  • [7]
  • [8]
  • [9]
  • [10]
  • [11]
  • [12]
  • [13]
  • [14]
  • [15]
  • [16]
  • [17]
  • [18]
  • [19]
  • [20]
  • [21]
  • [22]
  • [23]
  • [24]
  • [25]
  • [26]
  文献评价指标  
  下载次数:5次 浏览次数:0次