期刊论文详细信息
BMC Bioinformatics
MetaPathways: a modular pipeline for constructing pathway/genome databases from environmental sequence information
Kishori M Konwar1  Niels W Hanson2  Antoine P Pagé1  Steven J Hallam2 
[1] Department of Microbiology & Immunology, University of British Columbia, Vancouver, BC V6T1Z3, Canada
[2] Graduate Program in Bioinformatics, University of British Columbia, Vancouver, BC Canada
关键词: Metabolic interaction networks;    Metabolism;    Microbial community;    MetaCyc;    PathoLogic;    Pathway tools;    Metagenome;    Environmental pathway/Genome Database (ePGDB);   
Others  :  1087834
DOI  :  10.1186/1471-2105-14-202
 received in 2013-01-24, accepted in 2013-06-13,  发布年份 2013
PDF
【 摘 要 】

Background

A central challenge to understanding the ecological and biogeochemical roles of microorganisms in natural and human engineered ecosystems is the reconstruction of metabolic interaction networks from environmental sequence information. The dominant paradigm in metabolic reconstruction is to assign functional annotations using BLAST. Functional annotations are then projected onto symbolic representations of metabolism in the form of KEGG pathways or SEED subsystems.

Results

Here we present MetaPathways, an open source pipeline for pathway inference that uses the PathoLogic algorithm to map functional annotations onto the MetaCyc collection of reactions and pathways, and construct environmental Pathway/Genome Databases (ePGDBs) compatible with the editing and navigation features of Pathway Tools. The pipeline accepts assembled or unassembled nucleotide sequences, performs quality assessment and control, predicts and annotates noncoding genes and open reading frames, and produces inputs to PathoLogic. In addition to constructing ePGDBs, MetaPathways uses MLTreeMap to build phylogenetic trees for selected taxonomic anchor and functional gene markers, converts General Feature Format (GFF) files into concatenated GenBank files for ePGDB construction based on third-party annotations, and generates useful file formats including Sequin files for direct GenBank submission and gene feature tables summarizing annotations, MLTreeMap trees, and ePGDB pathway coverage summaries for statistical comparisons.

Conclusions

MetaPathways provides users with a modular annotation and analysis pipeline for predicting metabolic interaction networks from environmental sequence information using an alternative to KEGG pathways and SEED subsystems mapping. It is extensible to genomic and transcriptomic datasets from a wide range of sequencing platforms, and generates useful data products for microbial community structure and function analysis. The MetaPathways software package, installation instructions, and example data can be obtained from http://hallam.microbiology.ubc.ca/MetaPathways webcite.

【 授权许可】

   
2013 Konwar et al.; licensee BioMed Central Ltd.

【 预 览 】
附件列表
Files Size Format View
20150117050628301.pdf 1714KB PDF download
Figure 5. 30KB Image download
Figure 4. 126KB Image download
Figure 3. 50KB Image download
Figure 2. 244KB Image download
Figure 1. 128KB Image download
【 图 表 】

Figure 1.

Figure 2.

Figure 3.

Figure 4.

Figure 5.

【 参考文献 】
  • [1]Wright JJ, Konwar KM, Hallam SJ: Microbial ecology of expanding oxygen minimum zones. Nat Rev Microbiol 2012, 10:381-394.
  • [2]Delong EF: Towards microbial systems science: integrating microbial perspective, from genomes to biomes. Environ Microbiol 2002, 4:9-10.
  • [3]Falkowski PG, Fenchel T, Delong EF: The microbial engines that drive Earth's biogeochemical cycles. Science 2008, 320:1034-1039.
  • [4]Kunin V, Copeland A, Lapidus A, Mavromatis K, Hugenholtz P: A bioinformatician's guide to metagenomics. Microbiol Mol Biol Rev 2008, 72:557-578. Table of Contents
  • [5]Mavromatis K, Ivanova N, Barry K, Shapiro H, Goltsman E, McHardy AC, Rigoutsos I, Salamov A, Korzeniewski F, Land M, et al.: Use of simulated data sets to evaluate the fidelity of metagenomic processing methods. Nat Meth 2007, 4:495-500.
  • [6]Wooley JC, Godzik A, Friedberg I: A primer on metagenomics. PLoS Comput Biol 2010, 6:e1000667.
  • [7]Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool. J Mol Biol 1990, 215:403-410.
  • [8]Kanehisa M, Goto S: KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res 2000, 28:27-30.
  • [9]Okuda S, Yamada T, Hamajima M, Itoh M, Katayama T, Bork P, Goto S, Kanehisa M: KEGG Atlas mapping for global analysis of metabolic pathways. Nucleic Acids Res 2008, 36:W423-W426.
  • [10]Claudel Renard C, Chevalet C, Faraut T, Kahn D: Enzyme‒specific profiles for genome annotation: PRIAM. Nucleic Acids Res 2003, 31:6633-6639.
  • [11]Overbeek R, Begley T, Butler RM, Choudhuri JV, Chuang H-Y, Cohoon M, de Crécy-Lagard V, Diaz N, Disz T, Edwards R, et al.: The subsystems approach to genome annotation and its use in the project to annotate 1000 genomes. Nucleic Acids Res 2005, 33:5691-5702.
  • [12]Markowitz VM, Ivanova NN, Szeto E, Palaniappan K, Chu K, Dalevi D, Chen I-MA, Grechkin Y, Dubchak I, Anderson I, et al.: IMG/M: a data management and analysis system for metagenomes. Nucleic Acids Res 2008, 36:D534-D538.
  • [13]Markowitz VM, Chen I-MA, Chu K, Szeto E, Palaniappan K, Grechkin Y, Ratner A, Jacob B, Pati A, Huntemann M, et al.: IMG/M: the integrated metagenome data management and comparative analysis system. Nucleic Acids Res 2012, 40:D123-D129.
  • [14]Seshadri R, Kravitz SA, Smarr L, Gilna P, Frazier M: CAMERA: a community resource for metagenomics. PLoS Biol 2007, 5:e75.
  • [15]Meyer F, Paarmann D, D'Souza M, Olson R, Glass EM, Kubal M, Paczian T, Rodriguez A, Stevens R, Wilke A, et al.: The metagenomics RAST server – a public resource for the automatic phylogenetic and functional analysis of metagenomes. BMC Bioinformatics 2008, 9:386. BioMed Central Full Text
  • [16]Aziz RK, Bartels D, Best AA, DeJongh M, Disz T, Edwards RA, Formsma K, Gerdes S, Glass EM, Kubal M, et al.: The RAST Server: Rapid Annotations using Subsystems Technology. BMC Genomics 2008, 9:75. BioMed Central Full Text
  • [17]Meyer F, Overbeek R, Rodriguez A: FIGfams: yet another set of protein families. Nucleic Acids Res 2009, 37:6643-6654.
  • [18]Karp PD, Paley S, Romero P: The pathway tools software. Bioinformatics 2002, 18:S225-S232.
  • [19]Karp PD, Paley SM, Krummenacker M, Latendresse M, Dale JM, Lee TJ, Kaipa P, Gilham F, Spaulding A, Popescu L, et al.: Pathway Tools version 13.0: integrated software for pathway/genome informatics and systems biology. Brief Bioinformatics 2010, 11:40-79.
  • [20]Karp PD, Latendresse M, Caspi R: The pathway tools pathway prediction algorithm. Stand Genomic Sci 2011, 5:424-429.
  • [21]Latendresse M, Krummenacker M, Trupp M, Karp PD: Construction and completion of flux balance models from pathway databases. Bioinformatics 2012, 28:388-396.
  • [22]Hucka M, Finney A, Sauro HM, Bolouri H, Doyle JC, Kitano H, Arkin AP, Bornstein BJ, Bray D, Cornish-Bowden A, et al.: The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models. Bioinformatics 2003, 19:524-531.
  • [23]Karp PD, Riley M, Saier M, Paulsen IT, Paley SM, Pellegrini-Toole A: The EcoCyc and MetaCyc databases. Nucleic Acids Res 2000, 28:56-59.
  • [24]Caspi R, Altman T, Dreher K, Fulcher CA, Subhraveti P, Keseler IM, Kothari A, Krummenacker M, Latendresse M, Mueller LA, et al.: The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of pathway/genome databases. Nucleic Acids Res 2012, 40:D742-D753.
  • [25]Latendresse M, Paley S, Karp PD: Browsing metabolic and regulatory networks with BioCyc. Methods Mol Biol 2012, 804:197-216.
  • [26]Stark M, Berger SA, Stamatakis A, Mering von C: MLTreeMap–accurate Maximum Likelihood placement of environmental DNA sequences into taxonomic and functional reference phylogenies. BMC Genomics 2010, 11:461. BioMed Central Full Text
  • [27]Hyatt D, Chen G-L, LoCascio PF, Land ML, Larimer FW, Hauser LJ: Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 2010, 11:119. BioMed Central Full Text
  • [28]Tatusov RL, Natale DA, Garkavtsev IV, Tatusova TA, Shankavaram UT, Rao BS, Kiryutin B, Galperin MY, Fedorova ND, Koonin EV: The COG database: new developments in phylogenetic classification of proteins from complete genomes. Nucleic Acids Res 2001, 29:22-28.
  • [29]Pruitt KD, Maglott DR: RefSeq and LocusLink: NCBI gene-centered resources. Nucleic Acids Res 2001, 29:137-140.
  • [30]Kiełbasa SM, Wan R, Sato K, Horton P, Frith MC: Adaptive seeds tame genomic sequence comparison. Genome Res 2011, 21:487-493.
  • [31]Rasko DA, Myers GSA, Ravel J: Visualization of comparative genomic analyses by BLAST score ratio. BMC Bioinformatics 2005, 6:2. BioMed Central Full Text
  • [32]Rost B: Twilight zone of protein sequence alignments. Protein Eng 1999, 12:85-94.
  • [33]Gentzsch W: Sun Grid Engine: towards creating a compute power grid. CCGRID-01. IEEE Comput. Soc 2001, 35-36.
  • [34]Pruesse E, Quast C, Knittel K, Fuchs BM, Ludwig W, Peplies J, Glöckner FO: SILVA: a comprehensive online resource for quality checked and aligned ribosomal RNA sequence data compatible with ARB. Nucleic Acids Res 2007, 35:7188-7196.
  • [35]DeSantis TZ, Hugenholtz P, Larsen N, Rojas M, Brodie EL, Keller K, Huber T, Dalevi D, Hu P, Andersen GL: Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB. Appl Environ Microbiol 2006, 72:5069-5072.
  • [36]Lowe TM, Eddy SR: tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res 1997, 25:0955-0964.
  • [37]Huson DH, Auch AF, Qi J, Schuster SC: MEGAN analysis of metagenomic data. Genome Res 2007, 17:377-386.
  • [38]Latendresse M, Karp PD: An advanced web query interface for biological databases. Database (Oxford) 2010, 2010:baq006.
  • [39]Paley SM, Karp PD: The Pathway Tools cellular overview diagram and Omics Viewer. Nucleic Acids Res 2006, 34:3771-3778.
  • [40]Western Canadian Research Grid (WestGrid). http://www.westgrid.ca/ webcite
  • [41]Dale JM, Popescu L, Karp PD: Machine learning methods for metabolic pathway prediction. BMC Bioinformatics 2010, 11:15. BioMed Central Full Text
  • [42]Richter DC, Ott F, Auch AF, Schmid R, Huson DH: MetaSim—A Sequencing Simulator for Genomics and Metagenomics. PLoS One 2008, 3:e3373.
  • [43]Barton AD, Dutkiewicz S, Flierl G, Bragg J, Follows MJ: Patterns of diversity in marine phytoplankton. Science 2010, 327:1509-1511.
  • [44]Follows MJ, Dutkiewicz S, Grant S, Chisholm SW: Emergent Biogeography of Microbial Communities in a Model Ocean. Science 2007, 315:1843-1846.
  • [45]Larsen PE, Field D, Gilbert JA: Predicting bacterial community assemblages using an artificial neural network approach. Nat Meth 2012, 9:621-625.
  • [46]Larsen PE, Collart FR, Field D, Meyer F, Keegan KP, Henry CS, McGrath J, Quinn J, Gilbert JA: Predicted Relative Metabolomic Turnover (PRMT): determining metabolic turnover from a coastal marine metagenomic dataset. Microbial Informatics and Experimentation 2011, 1:4. BioMed Central Full Text
  • [47]Abubucker S, Segata N, Goll J, Schubert AM, Izard J, Cantarel BL, Rodriguez-Mueller B, Zucker J, Thiagarajan M, Henrissat B, et al.: Metabolic Reconstruction for Metagenomic Data and Its Application to the Human Microbiome. PLoS Comput Biol 2012, 8:e1002358.
  • [48]Ye Y, Doak TG: A parsimony approach to biological pathway reconstruction/inference for genomes and metagenomes. PLoS Comput Biol 2009, 5:e1000465.
  • [49]Goll J, Thiagarajan M, Abubucker S, Huttenhower C, Yooseph S, Methé BA: A case study for large-scale human microbiome analysis using JCVI's metagenomics reports (METAREP). PLoS One 2012, 7:e29044.
  • [50]Henry CS, DeJongh M, Best AA, Frybarger PM, Linsay B, Stevens RL: High-throughput generation, optimization and analysis of genome-scale metabolic models. Nat Biotechnol 2010, 28:977-982.
  • [51]Henry CS, Overbeek R, Xia F, Best AA, Glass E, Gilbert J, Larsen P, Edwards R, Disz T, Meyer F, et al.: Connecting genotype to phenotype in the era of high-throughput sequencing. Biochim Biophys Acta 1810, 2011:967-977.
  • [52]Kalyanaraman A, Aluru S, Kothari S, Brendel V: Efficient clustering of large EST data sets on parallel computers. Nucleic Acids Res 2003, 31:2963-2974.
  • [53]Yooseph S, Sutton G, Rusch DB, Halpern AL, Williamson SJ, Remington K, Eisen JA, Heidelberg KB, Manning G, Li W: The Sorcerer II Global Ocean Sampling expedition: expanding the universe of protein families. PLoS Biol 2007, 5:e16.
  • [54]Kalyanaraman A, Cannon WR, Latt B, Baxter DJ: MapReduce implementation of a hybrid spectral library-database search method for large-scale peptide identification. Bioinformatics 2011, 27:3072-3073.
  文献评价指标  
  下载次数:65次 浏览次数:34次