| BMC Bioinformatics | |
| Decontaminating eukaryotic genome assemblies with machine learning | |
| Methodology Article | |
| Duncan A. Murdock1  Janna L. Fierst1  | |
| [1] Department of Biological Sciences, University of Alabama, 35487, Tuscaloosa, AL, USA; | |
| 关键词: DNA sequencing; High-throughput; Genome assembly; Contamination; Sequence filtering; | |
| DOI : 10.1186/s12859-017-1941-0 | |
| received in 2017-10-04, accepted in 2017-11-14, 发布年份 2017 | |
| 来源: Springer | |
PDF
|
|
【 摘 要 】
BackgroundHigh-throughput sequencing has made it theoretically possible to obtain high-quality de novo assembled genome sequences but in practice DNA extracts are often contaminated with sequences from other organisms. Currently, there are few existing methods for rigorously decontaminating eukaryotic assemblies. Those that do exist filter sequences based on nucleotide similarity to contaminants and risk eliminating sequences from the target organism.ResultsWe introduce a novel application of an established machine learning method, a decision tree, that can rigorously classify sequences. The major strength of the decision tree is that it can take any measured feature as input and does not require a priori identification of significant descriptors. We use the decision tree to classify de novo assembled sequences and compare the method to published protocols.ConclusionsA decision tree performs better than existing methods when classifying sequences in eukaryotic de novo assemblies. It is efficient, readily implemented, and accurately identifies target and contaminant sequences. Importantly, a decision tree can be used to classify sequences according to measured descriptors and has potentially many uses in distilling biological datasets.
【 授权许可】
CC BY
© The Author(s) 2017
【 预 览 】
| Files | Size | Format | View |
|---|---|---|---|
| RO202311091559732ZK.pdf | 1608KB | ||
| 12864_2017_3487_Article_IEq63.gif | 1KB | Image | |
| 12864_2017_4132_Article_IEq44.gif | 1KB | Image | |
| 12864_2017_3777_Article_IEq20.gif | 1KB | Image | |
| 12864_2017_3487_Article_IEq65.gif | 1KB | Image | |
| 12864_2017_3487_Article_IEq67.gif | 1KB | Image | |
| 12864_2017_3487_Article_IEq68.gif | 1KB | Image | |
| 12864_2016_2816_Article_IEq7.gif | 1KB | Image |
【 图 表 】
12864_2016_2816_Article_IEq7.gif
12864_2017_3487_Article_IEq68.gif
12864_2017_3487_Article_IEq67.gif
12864_2017_3487_Article_IEq65.gif
12864_2017_3777_Article_IEq20.gif
12864_2017_4132_Article_IEq44.gif
12864_2017_3487_Article_IEq63.gif
【 参考文献 】
- [1]
- [2]
- [3]
- [4]
- [5]
- [6]
- [7]
- [8]
- [9]
- [10]
- [11]
- [12]
- [13]
- [14]
- [15]
- [16]
- [17]
- [18]
- [19]
- [20]
- [21]
- [22]
- [23]
- [24]
- [25]
- [26]
- [27]
- [28]
- [29]
- [30]
- [31]
- [32]
- [33]
- [34]
- [35]
- [36]
- [37]
- [38]
- [39]
- [40]
- [41]
- [42]
- [43]
- [44]
- [45]
- [46]
- [47]
- [48]
- [49]
- [50]
- [51]
- [52]
- [53]
- [54]
- [55]
- [56]
- [57]
- [58]
- [59]
- [60]
- [61]
- [62]
- [63]
- [64]
- [65]
- [66]
- [67]
- [68]
- [69]
- [70]
- [71]
- [72]
- [73]
PDF