期刊论文详细信息
BMC Bioinformatics
Clustering metagenomic sequences with interpolated Markov models
Methodology Article
Steven L Salzberg1  David R Kelley1 
[1] Center for Bioinformatics and Computational Biology, Institute for Advanced Computer Studies, 20742, College Park, MD, USA;Department of Computer Science, University of Maryland, A.V. Williams Building College Park, 20742, MD, USA;
关键词: Markov Chain Model;    Unsupervised Cluster;    Rand Index;    Cluster Accuracy;    Genome Signature;   
DOI  :  10.1186/1471-2105-11-544
 received in 2010-05-13, accepted in 2010-11-02,  发布年份 2010
来源: Springer
PDF
【 摘 要 】

BackgroundSequencing of environmental DNA (often called metagenomics) has shown tremendous potential to uncover the vast number of unknown microbes that cannot be cultured and sequenced by traditional methods. Because the output from metagenomic sequencing is a large set of reads of unknown origin, clustering reads together that were sequenced from the same species is a crucial analysis step. Many effective approaches to this task rely on sequenced genomes in public databases, but these genomes are a highly biased sample that is not necessarily representative of environments interesting to many metagenomics projects.ResultsWe present SCIMM (Sequence Clustering with Interpolated Markov Models), an unsupervised sequence clustering method. SCIMM achieves greater clustering accuracy than previous unsupervised approaches. We examine the limitations of unsupervised learning on complex datasets, and suggest a hybrid of SCIMM and supervised learning method Phymm called PHY SCIMM that performs better when evolutionarily close training genomes are available.ConclusionsSCIMM and PHY SCIMM are highly accurate methods to cluster metagenomic sequences. SCIMM operates entirely unsupervised, making it ideal for environments containing mostly novel microbes. PHY SCIMM uses supervised learning to improve clustering in environments containing microbial strains from well-characterized genera. SCIMM and PHY SCIMM are available open source from http://www.cbcb.umd.edu/software/scimm.

【 授权许可】

CC BY   
© Kelley and Salzberg; licensee BioMed Central Ltd. 2010

【 预 览 】
附件列表
Files Size Format View
RO202311092256800ZK.pdf 859KB PDF download
【 参考文献 】
  • [1]
  • [2]
  • [3]
  • [4]
  • [5]
  • [6]
  • [7]
  • [8]
  • [9]
  • [10]
  • [11]
  • [12]
  • [13]
  • [14]
  • [15]
  • [16]
  • [17]
  • [18]
  • [19]
  • [20]
  • [21]
  • [22]
  • [23]
  • [24]
  • [25]
  • [26]
  • [27]
  • [28]
  • [29]
  • [30]
  • [31]
  • [32]
  • [33]
  • [34]
  • [35]
  • [36]
  • [37]
  • [38]
  • [39]
  • [40]
  • [41]
  • [42]
  • [43]
  • [44]
  • [45]
  • [46]
  • [47]
  • [48]
  • [49]
  • [50]
  • [51]
  • [52]
  • [53]
  • [54]
  文献评价指标  
  下载次数:9次 浏览次数:0次