期刊论文详细信息
BMC Bioinformatics
Consistency of metagenomic assignment programs in simulated and real data
Koldo Garcia-Etxebarria1  Marc Garcia-Garcerà1  Francesc Calafell1 
[1] Institut de Biologia Evolutiva (CSIC-Universitat Pompeu Fabra), Barcelona, Spain
关键词: Comparison;    Assignment;    Metagenomics;   
Others  :  1087579
DOI  :  10.1186/1471-2105-15-90
 received in 2013-05-07, accepted in 2014-03-22,  发布年份 2014
PDF
【 摘 要 】

Background

Metagenomics is the genomic study of uncultured environmental samples, which has been greatly facilitated by the advent of shotgun-sequencing technologies. One of the main focuses of metagenomics is the discovery of previously uncultured microorganisms, which makes the assignment of sequences to a particular taxon a challenge and a crucial step. Recently, several methods have been developed to perform this task, based on different methodologies such as sequence composition or sequence similarity. The sequence composition methods have the ability to completely assign the whole dataset. However, their use in metagenomics and the study of their performance with real data is limited. In this work, we assess the consistency of three different methods (BLAST + Lowest Common Ancestor, Phymm, and Naïve Bayesian Classifier) in assigning real and simulated sequence reads.

Results

Both in real and in simulated data, BLAST + Lowest Common Ancestor (BLAST + LCA), Phymm, and Naïve Bayesian Classifier consistently assign a larger number of reads in higher taxonomic levels than in lower levels. However, discrepancies increase at lower taxonomic levels. In simulated data, consistent assignments between all three methods showed greater precision than assignments based on Phymm or Bayesian Classifier alone, since the BLAST + LCA algorithm performed best. In addition, assignment consistency in real data increased with sequence read length, in agreement with previously published simulation results.

Conclusions

The use and combination of different approaches is advisable to assign metagenomic reads. Although the sensitivity could be reduced, the reliability can be increased by using the reads consistently assigned to the same taxa by, at least, two methods, and by training the programs using all available information.

【 授权许可】

   
2014 Garcia-Etxebarria et al.; licensee BioMed Central Ltd.

【 预 览 】
附件列表
Files Size Format View
20150117021322648.pdf 442KB PDF download
Figure 4. 69KB Image download
Figure 3. 27KB Image download
Figure 1. 196KB Image download
Figure 1. 93KB Image download
【 图 表 】

Figure 1.

Figure 1.

Figure 3.

Figure 4.

【 参考文献 】
  • [1]Wooley JC, Godzik A, Friedberg I: A primer on metagenomics. PLoS Comput Biol 2010, 6:e1000667.
  • [2]Bazinet AL, Cummings MP: A comparative evaluation of sequence classification programs. BMC Bioinformatics 2012, 13:92. BioMed Central Full Text
  • [3]Rosen GL, Polikar R, Caseiro DA, Essinger SD, Sokhansanj BA: Discovering the unknown: improving detection of novel species and genera from short reads. J Biomed Biotechnol 2011, 2011:495849.
  • [4]Meyer F, Paarmann D, D’Souza M, Olson R, Glass EM, Kubal M, Paczian T, Rodriguez A, Stevens R, Wilke A, Wilkening J, Edwards RA: The metagenomics RAST server - a public resource for the automatic phylogenetic and functional analysis of metagenomes. BMC Bioinformatics 2008, 9:386. BioMed Central Full Text
  • [5]Huson DH, Auch AF, Qi J, Schuster SC: MEGAN analysis of metagenomic data. Genome Res 2007, 17:377-386.
  • [6]Parks DH, MacDonald NJ, Beiko RG: Classifying short genomic fragments from novel lineages using composition and homology. BMC Bioinformatics 2011, 12:328. BioMed Central Full Text
  • [7]Brady A, Salzberg SL: Phymm and PhymmBL: metagenomic phylogenetic classification with interpolated Markov models. Nat Methods 2009, 6:673-676.
  • [8]Mavromatis K, Ivanova N, Barry K, Shapiro H, Goltsman E, Mchardy AC, Rigoutsos I, Salamov A, Korzeniewski F, Land M, Lapidus A, Grigoriev I, Richardson P, Hugenholtz P, Kyrpides NC: Use of simulated data sets to evaluate the fidelity of metagenomic processing methods. Nat Methods 2007, 4:495-500.
  • [9]Belda-Ferre P, Alcaraz LD, Cabrera-Rubio R, Romero H, Simón-Soro A, Pignatelli M, Mira A: The oral metagenome in health and disease. ISME J 2012, 6:46-56.
  • [10]Donia MS, Fricke WF, Ravel J, Schmidt EW: Variation in tropical reef symbiont metagenomes defined by secondary metabolism. PLoS One 2011, 6:e17897.
  • [11]Suen G, Scott JJ, Aylward FO, Adams SM, Tringe SG, Pinto-Tomás AA, Foster CE, Pauly M, Weimer PJ, Barry KW, Goodwin LA, Bouffard P, Li L, Osterberger J, Harkins TT, Slater SC, Donohue TJ, Currie CR: An insect herbivore microbiome with high plant biomass-degrading capacity. PLoS Genet 2010, 6(9):e1001129.
  • [12]Swanson KS, Dowd SE, Suchodolski JS, Middelbos IS, Vester BM, Barry KA, Nelson KE, Torralba M, Henrissat B, Coutinho PM, Cann IKO, White BA, Fahey GC: Phylogenetic and gene-centric metagenomics of the canine intestinal microbiome reveals similarities with humans and mice. ISME J 2011, 5:639-649.
  • [13]Richter DC, Ott F, Auch AF, Schmid R, Huson DH: MetaSim: a sequencing simulator for genomics and metagenomics. PLoS One 2008, 3:e3373.
  • [14]Wommack KE, Bhavsar J, Ravel J: Metagenomics: read length matters. Appl Environ Microbiol 2008, 74:1453-1463.
  • [15]Yok NG, Rosen GL: Combining gene prediction methods to improve metagenomic gene annotation. BMC Bioinformatics 2011, 12:20. BioMed Central Full Text
  • [16]Garcia-Garcerà M, Garcia-Etxebarria K, Coscollà M, Latorre A, Calafell F: A new method for extracting skin microbes allows metagenomic analysis of whole-deep skin. PLoS One 2013, 8:e74914.
  • [17]Zheng Z, Advani A, Melefors Ö, Glavas S, Nordström H, Ye W, Engstrand L, Andersson AF: Titration-free 454 sequencing using Y adapters. Nat Protoc 2011, 6:1367-1376.
  • [18]Nelson KE, Weinstock GM, Highlander SK, Worley KC, Creasy HH, Wortman JR, Rusch DB, Mitreva M, Sodergren E, Chinwalla AT, Feldgarden M, Gevers D, Haas BJ, Madupu R, Ward DV, Birren BW, Gibbs RA, Methe B, Petrosino JF, Strausberg RL, Sutton GG, White OR, Wilson RK, Durkin S, Giglio MG, Gujja S, Howarth C, Kodira CD, Kyrpides N, Mehta T, et al.: A catalog of reference genomes from the human microbiome. Science 2010, 328:994-999.
  • [19]Schmieder R, Edwards R: Fast identification and removal of sequence contamination from genomic and metagenomic datasets. PLoS One 2011, 6:e17288.
  • [20]Altschul SF, Madden TL, Schaffer AA, Zhang JH, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 1997, 25:3389-3402.
  • [21]R Core team: R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing; 2012.
  文献评价指标  
  下载次数:30次 浏览次数:14次