期刊论文详细信息
Genome Biology
Ray Meta: scalable de novo metagenome assembly and profiling
Jacques Corbeil2  François Laviolette1  Élénie Godzaridis3  Frédéric Raymond3  Sébastien Boisvert3 
[1] Department of Computer Science and Software Engineering, Faculty of Science and Engineering, Laval University, 1065, av. de la Médecine, Québec (Québec), G1V 0A6, Canada;Department of Molecular Medicine, Faculty of Medicine, Laval University, 1050, av. de la Médecine, Québec (Québec), G1V 0A6, Canada;Faculty of Medicine, Laval University, 1050, av. de la Médecine, Québec (Québec), G1V 0A6, Canada
关键词: distributed;    parallel;    next-generation sequencing;    profiling;    de novo assembly;    scalability;    message passing;    metagenomics;   
Others  :  866941
DOI  :  10.1186/gb-2012-13-12-r122
 received in 2012-08-01, accepted in 2012-12-22,  发布年份 2012
PDF
【 摘 要 】

Voluminous parallel sequencing datasets, especially metagenomic experiments, require distributed computing for de novo assembly and taxonomic profiling. Ray Meta is a massively distributed metagenome assembler that is coupled with Ray Communities, which profiles microbiomes based on uniquely-colored k-mers. It can accurately assemble and profile a three billion read metagenomic experiment representing 1,000 bacterial genomes of uneven proportions in 15 hours with 1,024 processor cores, using only 1.5 GB per core. The software will facilitate the processing of large and complex datasets, and will help in generating biological insights for specific environments. Ray Meta is open source and available at http://denovoassembler.sf.net webcite.

【 授权许可】

   
2012 Boisvert et al.; licensee BioMed Central Ltd.

【 预 览 】
附件列表
Files Size Format View
20140728072822609.pdf 657KB PDF download
159KB Image download
47KB Image download
94KB Image download
54KB Image download
38KB Image download
【 图 表 】

【 参考文献 】
  • [1]Wold B, Myers RM: Sequence census methods for functional genomics. Nature Methods 2008, 5:19-21.
  • [2]Brenner S: Sequences and consequences. Philosophical Transactions of the Royal Society B: Biological Sciences 2010, 365:207-212.
  • [3]McPherson JD: Next-generation gap. Nature Methods 2009, 6:S2-S5.
  • [4]Mardis E: The $1,000 genome, the $100,000 analysis?. Genome Medicine 2010, 2:84. BioMed Central Full Text
  • [5]Compeau PEC, Pevzner PA, Tesler G: How to apply de Bruijn graphs to genome assembly. Nature Biotechnology 2011, 29:987-991.
  • [6]Flicek P, Birney E: Sense from sequence reads: methods for alignment and assembly. Nature Methods 2009, 6:S6-S12.
  • [7]Iqbal Z, Caccamo M, Turner I, Flicek P, McVean G: De novo assembly and genotyping of variants using colored de Bruijn graphs. Nature Genetics 2012, 44:226-232.
  • [8]Miller JR, Koren S, Sutton G: Assembly algorithms for next-generation sequencing data. Genomics 2010, 95:315-327.
  • [9]Salzberg SL: Beware of mis-assembled genomes. Bioinformatics 2005, 21:4320-4321.
  • [10]Treangen TJ, Salzberg SL: Repetitive DNA and next-generation sequencing: computational challenges and solutions. Nature Reviews Genetics 2011, 13:36-46.
  • [11]Lorenz P, Eck J: Metagenomics and industrial applications. Nature Reviews Microbiology 2005, 3:510-516.
  • [12]Scholz MB, Lo CC, Chain PSG: Next generation sequencing and bioinformatic bottlenecks: the current state of metagenomic data analysis. Current Opinion in Biotechnology 2012, 23:9-15.
  • [13]Schoenfeld T, Patterson M, Richardson PM, Wommack KE, Young M, Mead D: Assembly of viral metagenomes from Yellowstone Hot Springs. Applied and Environmental Microbiology 2008, 74:4164-4174.
  • [14]Varin T, Lovejoy C, Jungblut AD, Vincent WF, Corbeil J: Metagenomic analysis of stress genes in microbial mat communities from Antarctica and the high Arctic. Applied and Environmental Microbiology 2012, 78:549-559.
  • [15]Varin T, Lovejoy C, Jungblut AD, Vincent WF, Corbeil J: Metagenomic profiling of Arctic microbial mat communities as nutrient scavenging and recycling systems. Limnology and Oceanography 2010, 55:1901-1911.
  • [16]Narasingarao P, Podell S, Ugalde JA, Brochier-Armanet C, Emerson JB, Brocks JJ, Heidelberg KB, Banfield JF, Allen EE: De novo metagenomic assembly reveals abundant novel major lineage of Archaea in hypersaline microbial communities. The ISME Journal 2011, 6:81-93.
  • [17]Tringe SG, von Mering C, Kobayashi A, Salamov AA, Chen K, Chang HW, Podar M, Short JM, Mathur EJ, Detter JC, Bork P, Hugenholtz P, Rubin EM: Comparative metagenomics of microbial communities. Science 2005, 308:554-557.
  • [18]Tyson GW, Chapman J, Hugenholtz P, Allen EE, Ram RJ, Richardson PM, Solovyev VV, Rubin EM, Rokhsar DS, Banfield JF: Community structure and metabolism through reconstruction of microbial genomes from the environment. Nature 2004, 428:37-43.
  • [19]Naviaux RK, Good B, McPherson JD, Steffen DL, Markusic D, Ransom B, Corbeil J: Sand DNA - a genetic library of life at the water's edge. Marine Ecology Progress Series 2005, 301:9-22.
  • [20]Cho I, Blaser MJ: The human microbiome: at the interface of health and disease. Nature Reviews Genetics 2012, 13:260-270.
  • [21]Gill SR, Pop M, Deboy RT, Eckburg PB, Turnbaugh PJ, Samuel BS, Gordon JI, Relman DA, Fraser-Liggett CM, Nelson KE: Metagenomic analysis of the human distal gut microbiome. Science 2006, 312:1355-1359.
  • [22]Qin J, Li R, Raes J, Arumugam M, Burgdorf KS, Manichanh C, Nielsen T, Pons N, Levenez F, Yamada T, Mende DR, Li J, Xu J, Li S, Li D, Cao J, Wang B, Liang H, Zheng H, Xie Y, Tap J, Lepage P, Bertalan M, Batto JM, Hansen T, Le Paslier D, Linneberg A, Nielsen HB, Pelletier E, Renault P, et al.: A human gut microbial gene catalogue established by metagenomic sequencing. Nature 2010, 464:59-65.
  • [23]Arumugam M, Raes J, Pelletier E, Le Paslier D, Yamada T, Mende DR, Fernandes GR, Tap J, Bruls T, Batto JMM, Bertalan M, Borruel N, Casellas F, Fernandez L, Gautier L, Hansen T, Hattori M, Hayashi T, Kleerebezem M, Kurokawa K, Leclerc M, Levenez F, Manichanh C, Nielsen HB, Nielsen T, Pons N, Poulain J, Qin J, Sicheritz-Ponten T, Tims S, et al.: Enterotypes of the human gut microbiome. Nature 2011, 473:174-180.
  • [24]Consortium THMP: Structure, function and diversity of the healthy human microbiome. Nature 2012, 486:207-214.
  • [25]Schloss PD, Handelsman J: Introducing DOTUR, a computer program for defining operational taxonomic units and estimating species richness. Applied and Environmental Microbiology 2005, 71:1501-1506.
  • [26]Liu B, Gibbons T, Ghodsi M, Pop M: MetaPhyler: taxonomic profiling for metagenomic sequences. In 2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). IEEE; 2010:95-100.
  • [27]Segata N, Waldron L, Ballarini A, Narasimhan V, Jousson O, Huttenhower C: Metagenomic microbial community profiling using unique clade-specific marker genes. Nature Methods 2012, 9:811-814.
  • [28]McDonald D, Price MN, Goodrich J, Nawrocki EP, DeSantis TZ, Probst A, Andersen GL, Knight R, Hugenholtz P: An improved Greengenes taxonomy with explicit ranks for ecological and evolutionary analyses of bacteria and archaea. The ISME Journal 2011, 6:610-618.
  • [29]Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G: Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nature Genetics 2000, 25:25-29.
  • [30]Simpson JT, Wong K, Jackman SD, Schein JE, Jones SJM, Birol I: ABySS: a parallel assembler for short read sequence data. Genome Research 2009, 19:1117-1123.
  • [31]Boisvert S, Laviolette F, Corbeil J: Ray: simultaneous assembly of reads from a mix of high-throughput sequencing technologies. Journal of Computational Biology 2010, 17:1519-1533.
  • [32]Schatz MC, Langmead B, Salzberg SL: Cloud computing and the DNA data race. Nature Biotechnology 2010, 28:691-693.
  • [33]Huson DH, Mitra S, Ruscheweyh HJ, Weber N, Schuster SC: Integrative analysis of environmental sequences using MEGAN4. Genome Research 2011, 21:1552-1560.
  • [34]Meyer F, Paarmann D, D'Souza M, Olson R, Glass EM, Kubal M, Paczian T, Rodriguez A, Stevens R, Wilke A, Wilkening J, Edwards RA: The etagenomics RAST server - a public resource for the automatic phylogenetic and functional analysis of metagenomes. BMC Bioinformatics 2008, 9:386-388. BioMed Central Full Text
  • [35]Dixon P: VEGAN, a package of R functions for community ecology. Journal of Vegetation Science 2003, 14:927-930.
  • [36]Caporaso JG, Kuczynski J, Stombaugh J, Bittinger K, Bushman FD, Costello EK, Fierer N, Pena AG, Goodrich JK, Gordon JI, Huttley GA, Kelley ST, Knights D, Koenig JE, Ley RE, Lozupone CA, McDonald D, Muegge BD, Pirrung M, Reeder J, Sevinsky JR, Turnbaugh PJ, Walters WA, Widmann J, Yatsunenko T, Zaneveld J, Knight R: QIIME allows analysis of high-throughput community sequencing data. Nature Methods 2010, 7:335-336.
  • [37]Krause L, Diaz NN, Goesmann A, Kelley S, Nattkemper TW, Rohwer F, Edwards RA, Stoye J: Phylogenetic classification of short environmental DNA fragments. Nucleic Acids Research 2008, 36:2230-2239.
  • [38]Brady A, Salzberg SL: Phymm and PhymmBL: metagenomic phylogenetic classification with interpolated Markov models. Nature Methods 2009, 6:673-676.
  • [39]Namiki T, Hachiya T, Tanaka H, Sakakibara Y: MetaVelvet: an extension of Velvet assembler to de novo metagenome assembly from short sequence reads. Nucleic Acids Research 2012, 40:e155.
  • [40]Peng Y, Leung HCM, Yiu SM, Chin FYL: Meta-IDBA: a de novo assembler for metagenomic data. Bioinformatics 2011, 27:i94-i101.
  • [41]Laserson J, Jojic V, Koller D: Genovo: de novo assembly for metagenomes. Journal of Computational Biology 2011, 18:429-443.
  • [42]Wu GD, Chen J, Hoffmann C, Bittinger K, Chen YYY, Keilbaugh SA, Bewtra M, Knights D, Walters WA, Knight R, Sinha R, Gilroy E, Gupta K, Baldassano R, Nessel L, Li H, Bushman FD, Lewis JD: Linking long-term dietary patterns with gut microbial enterotypes. Science (New York, NY) 2011, 334:105-108.
  • [43]Pevzner PA, Tang H, Waterman MS: An Eulerian path approach to DNA fragment assembly. Proceedings of the National Academy of Sciences 2001, 98:9748-9753.
  • [44]Kurtz S, Phillippy A, Delcher AL, Smoot M, Shumway M, Antonescu C, Salzberg SL: Versatile and open software for comparing large genomes. Genome Biol 2004, 5:R12. BioMed Central Full Text
  • [45]Schadt EE, Linderman MD, Sorenson J, Lee L, Nolan GP: Computational solutions to large-scale data management and analysis. Nature Reviews Genetics 2010, 11:647-657.
  • [46]Barabasi AL, Oltvai ZN: Network biology: understanding the cell's functional organization. Nature Reviews Genetics 2004, 5:101-113.
  • [47]Benson DA, Boguski MS, Lipman DJ, Ostell J: GenBank. Nucleic Acids Research 1997, 25:1-6.
  • [48]Kulikova T, Aldebert P, Althorpe N, Baker W, Bates K, Browne P, van den Broek A, Cochrane G, Duggan K, Eberhardt R, Faruque N, Garcia-Pastor M, Harte N, Kanz C, Leinonen R, Lin Q, Lombard V, Lopez R, Mancuso R, McHale M, Nardone F, Silventoinen V, Stoehr P, Stoesser G, Ann M, Tzouvara K, Vaughan R, Wu D, Zhu W, Apweiler R: The EMBL nucleotide sequence database. Nucleic Acids Research 2004, 32:D27-30.
  • [49]Camon E, Magrane M, Barrell D, Lee V, Dimmer E, Maslen J, Binns D, Harte N, Lopez R, Apweiler R: The gene ontology annotation (GOA) database: sharing knowledge in Uniprot with gene ontology. Nucleic Acids Research 2004, 32:D262-266.
  • [50]Gabriel E, Fagg G, Bosilca G, Angskun T, Dongarra J, Squyres J, Sahay V, Kambadur P, Barrett B, Lumsdaine A, Castain R, Daniel D, Graham R, Woodall T, Gabriel E, Fagg GE, Bosilca G, Angskun T, Dongarra JJ, Squyres JM, Sahay V, Kambadur P, Barrett B, Lumsdaine A, Castain RH, Daniel DJ, Graham RL, Woodall TS: Open MPI: goals, concept, and design of a next generation MPI implementation recent advances in parallel virtual machine and message massing interface. In Recent Advances in Parallel Virtual Machine and Message Passing Interface, Volume 3241 of Lecture Notes in Computer Science. Volume 2004. Edited by Kranzlmüller D, Kacsuk P, Dongarra J. Berlin, Heidelberg. Springer Berlin/Heidelberg; :353-377.
  • [51]Gropp W: MPICH2: A new start for MPI implementations. In Recent Advances in Parallel Virtual Machine and Message Passing Interface, Volume 2474 of Lecture Notes in Computer Science. Edited by Kranzlmüller D, Volkert J, Kacsuk P, Dongarra J. Berlin, Heidelberg. Springer Berlin/Heidelberg; 2002:37-42.
  • [52]Kale LV, Krishnan S: CHARM++: a portable concurrent object oriented system based on C++. In Proceedings of the 8th Annual Conference on Object-Oriented Programming Systems, Languages, and Applications, OOPSLA '93, New York, NY, USA. ACM; 1993:91-108.
  文献评价指标  
  下载次数:56次 浏览次数:46次