| BMC Bioinformatics | |
| Scalable metagenomics alignment research tool (SMART): a scalable, rapid, and complete search heuristic for the classification of metagenomic sequences from complex sequence populations | |
| Software | |
| Cecilia S. Lee1  Aaron Y. Lee1  Russell N. Van Gelder2  | |
| [1] Department of Ophthalmology, University of Washington School of Medicine, Box 359608, 325 Ninth Avenue, 98104, Seattle, WA, USA;Department of Ophthalmology, University of Washington School of Medicine, Box 359608, 325 Ninth Avenue, 98104, Seattle, WA, USA;Departments of Biological Structure and Pathology, University of Washington School of Medicine, Seattle, WA, USA; | |
| 关键词: Simulated Dataset; Computational Node; Human Microbiome Project; MapReduce Framework; Cloud Computing Infrastructure; | |
| DOI : 10.1186/s12859-016-1159-6 | |
| received in 2015-12-30, accepted in 2016-07-21, 发布年份 2016 | |
| 来源: Springer | |
PDF
|
|
【 摘 要 】
BackgroundNext generation sequencing technology has enabled characterization of metagenomics through massively parallel genomic DNA sequencing. The complexity and diversity of environmental samples such as the human gut microflora, combined with the sustained exponential growth in sequencing capacity, has led to the challenge of identifying microbial organisms by DNA sequence. We sought to validate a Scalable Metagenomics Alignment Research Tool (SMART), a novel searching heuristic for shotgun metagenomics sequencing results.ResultsAfter retrieving all genomic DNA sequences from the NCBI GenBank, over 1 × 1011 base pairs of 3.3 × 106 sequences from 9.25 × 105 species were indexed using 4 base pair hashtable shards. A MapReduce searching strategy was used to distribute the search workload in a computing cluster environment. In addition, a one base pair permutation algorithm was used to account for single nucleotide polymorphisms and sequencing errors. Simulated datasets used to evaluate Kraken, a similar metagenomics classification tool, were used to measure and compare precision and accuracy. Finally using a same set of training sequences we compared Kraken, CLARK, and SMART within the same computing environment. Utilizing 12 computational nodes, we completed the classification of all datasets in under 10 min each using exact matching with an average throughput of over 1.95 × 106 reads classified per minute. With permutation matching, we achieved sensitivity greater than 83 % and precision greater than 94 % with simulated datasets at the species classification level. We demonstrated the application of this technique applied to conjunctival and gut microbiome metagenomics sequencing results. In our head to head comparison, SMART and CLARK had similar accuracy gains over Kraken at the species classification level, but SMART required approximately half the amount of RAM of CLARK.ConclusionsSMART is the first scalable, efficient, and rapid metagenomics classification algorithm capable of matching against all the species and sequences present in the NCBI GenBank and allows for a single step classification of microorganisms as well as large plant, mammalian, or invertebrate genomes from which the metagenomic sample may have been derived.
【 授权许可】
CC BY
© The Author(s). 2016
【 预 览 】
| Files | Size | Format | View |
|---|---|---|---|
| RO202311107045087ZK.pdf | 1792KB |
【 参考文献 】
- [1]
- [2]
- [3]
- [4]
- [5]
- [6]
- [7]
- [8]
- [9]
- [10]
- [11]
- [12]
- [13]
- [14]
- [15]
- [16]
- [17]
- [18]
- [19]
- [20]
- [21]
- [22]
- [23]
- [24]
- [25]
- [26]
- [27]
- [28]
- [29]
- [30]
- [31]
- [32]
- [33]
- [34]
PDF