| BMC Genomics | |
| HiMMe: using genetic patterns as a proxy for genome assembly reliability assessment | |
| Methodology Article | |
| Noushin Ghaffari1  Charles D. Johnson1  Aniruddha Datta2  Jordi Abante3  | |
| [1] Center for Bioinformatics and Genomic Systems Engineering (CBGSE), 101 Gateway Blvd., College Station, TX, USA;AgriLife Genomics and Bioinformatics, Texas A&M AgriLife Research, 101 Gateway, Suite A, College Station, TX, USA;Center for Bioinformatics and Genomic Systems Engineering (CBGSE), 101 Gateway Blvd., College Station, TX, USA;Dwight Look College of Engineering, Texas A&M University, 400 Bizzell St, College Station, TX, USA;Whitaker Biomedical Engineering Institute, Johns Hopkins University, 3400 N Charles St, Baltimore, MD, USA; | |
| 关键词: Genome assemblies; de novo; Sequence analysis; Hidden Markov models; Markov chains; Stochastic processes; Supervised learning; | |
| DOI : 10.1186/s12864-017-3965-2 | |
| received in 2017-01-30, accepted in 2017-07-27, 发布年份 2017 | |
| 来源: Springer | |
PDF
|
|
【 摘 要 】
BackgroundThe information content of genomes plays a crucial role in the existence and proper development of living organisms. Thus, tremendous effort has been dedicated to developing DNA sequencing technologies that provide a better understanding of the underlying mechanisms of cellular processes. Advances in the development of sequencing technology have made it possible to sequence genomes in a relatively fast and inexpensive way. However, as with any measurement technology, there is noise involved and this needs to be addressed to reach conclusions based on the resulting data. In addition, there are multiple intermediate steps and degrees of freedom when constructing genome assemblies that lead to ambiguous and inconsistent results among assemblers.MethodsHere we introduce HiMMe, an HMM-based tool that relies on genetic patterns to score genome assemblies. Through a Markov chain, the model is able to detect characteristic genetic patterns, while, by introducing emission probabilities, the noise involved in the process is taken into account. Prior knowledge can be used by training the model to fit a given organism or sequencing technology.ResultsOur results show that the method presented is able to recognize patterns even with relatively small k-mer size choices and limited computational resources.ConclusionsOur methodology provides an individual quality metric per contig in addition to an overall genome assembly score, with a time complexity well below that of an aligner. Ultimately, HiMMe provides meaningful statistical insights that can be leveraged by researchers to better select contigs and genome assemblies for downstream analysis.
【 授权许可】
CC BY
© The Author(s) 2017
【 预 览 】
| Files | Size | Format | View |
|---|---|---|---|
| RO202311094585974ZK.pdf | 1018KB | ||
| 12864_2017_4132_Article_IEq27.gif | 1KB | Image | |
| 12864_2017_3670_Article_IEq16.gif | 1KB | Image | |
| 12864_2016_3425_Article_IEq6.gif | 1KB | Image |
【 图 表 】
12864_2016_3425_Article_IEq6.gif
12864_2017_3670_Article_IEq16.gif
12864_2017_4132_Article_IEq27.gif
【 参考文献 】
- [1]
- [2]
- [3]
- [4]
- [5]
- [6]
- [7]
- [8]
- [9]
- [10]
- [11]
- [12]
- [13]
- [14]
- [15]
- [16]
- [17]
- [18]
- [19]
- [20]
- [21]
- [22]
- [23]
- [24]
- [25]
- [26]
PDF