BMC Bioinformatics | |
CoMet: a workflow using contig coverage and composition for binning a metagenomic sample with high precision | |
Research | |
Sen-Lin Tang1  Kshitij Tandon2  David Ackland3  Damayanthi Herath4  Saman Kumara Halgamuge5  | |
[1] Biodiversity Research Center, Academia Sinica, Nan-Kang, 11529, Taipei, Taiwan;Biodiversity Research Center, Academia Sinica, Nan-Kang, 11529, Taipei, Taiwan;Institute of Bioinformatics and Structural Biology, National Tsing Hua University, 300, Hsinchu, Taiwan;Bioinformatics Program, Institute of Information Science, Taiwan International Graduate Program, Academia Sinica, 115, Taipei, Taiwan;Department of Biomedical Engineering, The University of Melbourne, 3010, Victoria, Australia;Department of Mechanical Engineering, The University of Melbourne, Parkville, 3010, Melbourne, Australia;Department of Computer Engineering, University of Peradeniya, Prof. E. O. E. Pereira Mawatha, 20400, Peradeniya, Sri Lanka;Research School of Engineering, College of Engineering and Computer Science, The Australian National University, 2601, Canberra ACT, Australia; | |
关键词: Metagenomics; Binning; Contig coverage; Contig composition; DBSCAN algorithm; | |
DOI : 10.1186/s12859-017-1967-3 | |
来源: Springer | |
【 摘 要 】
BackgroundIn metagenomics, the separation of nucleotide sequences belonging to an individual or closely matched populations is termed binning. Binning helps the evaluation of underlying microbial population structure as well as the recovery of individual genomes from a sample of uncultivable microbial organisms. Both supervised and unsupervised learning methods have been employed in binning; however, characterizing a metagenomic sample containing multiple strains remains a significant challenge.In this study, we designed and implemented a new workflow, Coverage and composition based binning of Metagenomes (CoMet), for binning contigs in a single metagenomic sample. CoMet utilizes coverage values and the compositional features of metagenomic contigs. The binning strategy in CoMet includes the initial grouping of contigs in guanine-cytosine (GC) content-coverage space and refinement of bins in tetranucleotide frequencies space in a purely unsupervised manner. With CoMet, the clustering algorithm DBSCAN is employed for binning contigs. The performances of CoMet were compared against four existing approaches for binning a single metagenomic sample, including MaxBin, Metawatt, MyCC (default) and MyCC (coverage) using multiple datasets including a sample comprised of multiple strains.ResultsBinning methods based on both compositional features and coverages of contigs had higher performances than the method which is based only on compositional features of contigs. CoMet yielded higher or comparable precision in comparison to the existing binning methods on benchmark datasets of varying complexities. MyCC (coverage) had the highest ranking score in F1-score. However, the performances of CoMet were higher than MyCC (coverage) on the dataset containing multiple strains. Furthermore, CoMet recovered contigs of more species and was 18 - 39% higher in precision than the compared existing methods in discriminating species from the sample of multiple strains. CoMet resulted in higher precision than MyCC (default) and MyCC (coverage) on a real metagenome.ConclusionsThe approach proposed with CoMet for binning contigs, improves the precision of binning while characterizing more species in a single metagenomic sample and in a sample containing multiple strains. The F1-scores obtained from different binning strategies vary with different datasets; however, CoMet yields the highest F1-score with a sample comprised of multiple strains.
【 授权许可】
CC BY
© The Author(s) 2017
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
RO202311095765767ZK.pdf | 1216KB | download | |
12864_2015_2192_Article_IEq26.gif | 1KB | Image | download |
【 图 表 】
12864_2015_2192_Article_IEq26.gif
【 参考文献 】
- [1]
- [2]
- [3]
- [4]
- [5]
- [6]
- [7]
- [8]
- [9]
- [10]
- [11]
- [12]
- [13]
- [14]
- [15]
- [16]
- [17]
- [18]
- [19]
- [20]
- [21]
- [22]
- [23]
- [24]
- [25]
- [26]
- [27]
- [28]
- [29]
- [30]
- [31]
- [32]
- [33]
- [34]
- [35]
- [36]
- [37]
- [38]
- [39]
- [40]
- [41]
- [42]
- [43]
- [44]