期刊论文详细信息
PeerJ
BinSanity: unsupervised clustering of environmental microbial assemblies using coverage and affinity propagation
article
Elaina D. Graham1  John F. Heidelberg1  Benjamin J. Tully1 
[1] Department of Biological Sciences, University of Southern California;Center for Dark Energy Biosphere Investigations
关键词: Affinity propagation;    Metagenomics;    Microbial ecology;    Metagenome-assembled genomes;    Clustering;    Binning;   
DOI  :  10.7717/peerj.3035
学科分类:社会科学、人文和艺术(综合)
来源: Inra
PDF
【 摘 要 】

Metagenomics has become an integral part of defining microbial diversity in various environments. Many ecosystems have characteristically low biomass and few cultured representatives. Linking potential metabolisms to phylogeny in environmental microorganisms is important for interpreting microbial community functions and the impacts these communities have on geochemical cycles. However, with metagenomic studies there is the computational hurdle of ‘binning’ contigs into phylogenetically related units or putative genomes. Binning methods have been implemented with varying approaches such as k-means clustering, Gaussian mixture models, hierarchical clustering, neural networks, and two-way clustering; however, many of these suffer from biases against low coverage/abundance organisms and closely related taxa/strains. We are introducing a new binning method, BinSanity, that utilizes the clustering algorithm affinity propagation (AP), to cluster assemblies using coverage with compositional based refinement (tetranucleotide frequency and percent GC content) to optimize bins containing multiple source organisms. This separation of composition and coverage based clustering reduces bias for closely related taxa. BinSanity was developed and tested on artificial metagenomes varying in size and complexity. Results indicate that BinSanity has a higher precision, recall, and Adjusted Rand Index compared to five commonly implemented methods. When tested on a previously published environmental metagenome, BinSanity generated high completion and low redundancy bins corresponding with the published metagenome-assembled genomes.

【 授权许可】

CC BY   

【 预 览 】
附件列表
Files Size Format View
RO202307100014251ZK.pdf 1057KB PDF download
  文献评价指标  
  下载次数:1次 浏览次数:0次