期刊论文详细信息
BMC Evolutionary Biology
Functional phylogenomics analysis of bacteria and archaea using consistent genome annotation with UniFam
Chongle Pan3  Doug Hyatt2  Tae-Hyuk Ahn1  Guruprasad Kora1  Juanjuan Chai1 
[1]Computer Science and Mathematics Division, Oak Ridge National Laboratory, Oak Ridge, TN, USA
[2]Joint Institute for Biological Sciences, University of Tennessee, TN, Knoxville, USA
[3]BioSciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, USA
关键词: Phylogenomics;    Evolution;    Genomes;    Pathway;    Cellular function;    Prokaryotes;   
Others  :  1117899
DOI  :  10.1186/s12862-014-0207-y
 received in 2014-06-14, accepted in 2014-09-22,  发布年份 2014
PDF
【 摘 要 】

Background

Phylogenetic studies have provided detailed knowledge on the evolutionary mechanisms of genes and species in Bacteria and Archaea. However, the evolution of cellular functions, represented by metabolic pathways and biological processes, has not been systematically characterized. Many clades in the prokaryotic tree of life have now been covered by sequenced genomes in GenBank. This enables a large-scale functional phylogenomics study of many computationally inferred cellular functions across all sequenced prokaryotes.

Results

A total of 14,727 GenBank prokaryotic genomes were re-annotated using a new protein family database, UniFam, to obtain consistent functional annotations for accurate comparison. The functional profile of a genome was represented by the biological process Gene Ontology (GO) terms in its annotation. The GO term enrichment analysis differentiated the functional profiles between selected archaeal taxa. 706 prokaryotic metabolic pathways were inferred from these genomes using Pathway Tools and MetaCyc. The consistency between the distribution of metabolic pathways in the genomes and the phylogenetic tree of the genomes was measured using parsimony scores and retention indices. The ancestral functional profiles at the internal nodes of the phylogenetic tree were reconstructed to track the gains and losses of metabolic pathways in evolutionary history.

Conclusions

Our functional phylogenomics analysis shows divergent functional profiles of taxa and clades. Such function-phylogeny correlation stems from a set of clade-specific cellular functions with low parsimony scores. On the other hand, many cellular functions are sparsely dispersed across many clades with high parsimony scores. These different types of cellular functions have distinct evolutionary patterns reconstructed from the prokaryotic tree.

【 授权许可】

   
2014 Chai et al.; licensee BioMed Central Ltd.

【 预 览 】
附件列表
Files Size Format View
20150206012224560.pdf 3601KB PDF download
Figure 5. 240KB Image download
Figure 4. 69KB Image download
Figure 3. 119KB Image download
Figure 2. 142KB Image download
Figure 1. 120KB Image download
【 图 表 】

Figure 1.

Figure 2.

Figure 3.

Figure 4.

Figure 5.

【 参考文献 】
  • [1]Kanehisa M: KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res 2000, 28:27-30.
  • [2]Caspi R, Altman T, Dreher K, Fulcher CA, Subhraveti P, Keseler IM, Kothari A, Krummenacker M, Latendresse M, Mueller LA, Ong Q, Paley S, Pujar A, Shearer AG, Travers M, Weerasinghe D, Zhang P, Karp PD: The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of pathway/genome databases. Nucleic Acids Res 2012, 40(Database issue):D742-D753.
  • [3]Henry CS, DeJongh M, Best AA, Frybarger PM, Linsay B, Stevens RL: High-throughput generation, optimization and analysis of genome-scale metabolic models. Nat Biotechnol 2010, 28:977-982.
  • [4]Croft D, O’Kelly G, Wu G, Haw R, Gillespie M, Matthews L, Caudy M, Garapati P, Gopinath G, Jassal B, Jupe S, Kalatskaya I, Mahajan S, May B, Ndegwa N, Schmidt E, Shamovsky V, Yung C, Birney E, Hermjakob H, D’Eustachio P, Stein L: Reactome: a database of reactions, pathways and biological processes. Nucleic Acids Res 2011, 39(Database issue):D691-D697.
  • [5]Schellenberger J, Park JO, Conrad TM, Palsson BØ: BiGG: a biochemical genetic and genomic knowledgebase of large scale metabolic reconstructions.BMC Bioinformatics 2010, 11:213.
  • [6]Harris MA, Clark J, Ireland A, Lomax J, Ashburner M, Foulger R, Eilbeck K, Lewis S, Marshall B, Mungall C, Richter J, Rubin GM, Blake JA, Bult C, Dolan M, Drabkin H, Eppig JT, Hill DP, Ni L, Ringwald M, Balakrishnan R, Cherry JM, Christie KR, Costanzo MC, Dwight SS, Engel S, Fisk DG, Hirschman JE, Hong EL, Nash RS, et al.: The Gene Ontology (GO) database and informatics resource. Nucleic Acids Res 2004, 32(Database issue):D258-D261.
  • [7]Karp PD, Caspi R: A survey of metabolic databases emphasizing the MetaCyc family. Arch Toxicol 2011, 85:1015-1033.
  • [8]Benson DA, Cavanaugh M, Clark K, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW: GenBank. Nucleic Acids Res 2013, 41:D36-D42.
  • [9]Fraser C, Alm EJ, Polz MF, Spratt BG, Hanage WP: The bacterial species challenge: making sense of genetic and ecological diversity. Science 2009, 323:741-746.
  • [10]McInerney JO, Cotton JA, Pisani D: The prokaryotic tree of life: past, present… and future? Trends Ecol Evol 2008, 23:276-281.
  • [11]Piatigorsky J: Gene Sharing and Evolution: The Diversity of Protein Functions. Harvard University Press, Cambridge, MA; 2009.
  • [12]Shakhnovich BE, Koonin EV: Origins and impact of constraints in evolution of gene families. Genome Res 2006, 16:1529-1536.
  • [13]Schulenburg C, Miller BG: Enzyme recruitment and its role in metabolic expansion. Biochemistry 2014, 53:836-845.
  • [14]Holt JG: Bergey’s Manual of Determinative Bacteriology. Williams & Wilkins, Philadelphia, PA; 1994.
  • [15]Langille MGI, Zaneveld J, Caporaso JG, McDonald D, Knights D, Reyes JA, Clemente JC, Burkepile DE, Thurber RLV, Knight R: Predictive functional profiling of microbial communities using 16S rRNA marker gene sequences. Nat Biotechnol 2013, 31:814-821. others
  • [16]Boto L: Horizontal gene transfer in evolution: facts and challenges. Proc R Soc B Biol Sci 2010, 277:819-827.
  • [17]Gogarten JP, Townsend JP: Horizontal gene transfer, genome innovation and evolution. Nat Rev Microbiol 2005, 3:679-687.
  • [18]Brown JR: Ancient horizontal gene transfer. Nat Rev Genet 2003, 4:121-132.
  • [19]Nakamura Y, Itoh T, Matsuda H, Gojobori T: Biased biological functions of horizontally transferred genes in prokaryotic genomes. Nat Genet 2004, 36:760-766.
  • [20]Boucher Y, Douady CJ, Papke RT, Walsh DA, Boudreau MER, Nesbø CL, Case RJ, Doolittle WF: Lateral gene transfer and the origins of prokaryotic groups. Annu Rev Genet 2003, 37:283-328.
  • [21]Segata N, Börnigen D, Morgan XC, Huttenhower C: PhyloPhlAn is a new method for improved phylogenetic and taxonomic placement of microbes.Nat Commun 2013, 4:2304.
  • [22]Price MN, Dehal PS, Arkin AP: FastTree 2–approximately maximum-likelihood trees for large alignments.PLoS One 2010, 5:e9490.
  • [23]Hyatt D, Chen G-L, LoCascio PF, Land ML, Larimer FW, Hauser LJ: Prodigal: prokaryotic gene recognition and translation initiation site identification.BMC Bioinformatics 2010, 11:119.
  • [24]Wall ME, Raghavan S, Cohn JD, Dunbar J: Genome majority vote improves gene predictions.PLoS Comput Biol 2011, 7:e1002284.
  • [25]Dunbar J, Cohn JD, Wall ME: Consistency of gene starts among Burkholderia genomes.BMC Genomics 2011, 12:125.
  • [26]Robinson DF, Foulds LR: Comparison of phylogenetic trees. Math Biosci 1981, 53:131-147.
  • [27]Kuhner MK, Felsenstein J: A simulation comparison of phylogeny algorithms under equal and unequal evolutionary rates. Mol Biol Evol 1994, 11:459-468.
  • [28]Aziz RK, Bartels D, Best AA, DeJongh M, Disz T, Edwards RA, Formsma K, Gerdes S, Glass EM, Kubal M, Meyer F, Olsen GJ, Olson R, Osterman AL, Overbeek RA, McNeil LK, Paarmann D, Paczian T, Parrello B, Pusch GD, Reich C, Stevens R, Vassieva O, Vonstein V, Wilke A, Zagnitko O: The RAST Server: rapid annotations using subsystems technology.BMC Genomics 2008, 9:75.
  • [29]Seemann T: Prokka: rapid prokaryotic genome annotation. Bioinformatics 2014, 30:2068-2069.
  • [30]Karp PD, Paley SM, Krummenacker M, Latendresse M, Dale JM, Lee TJ, Kaipa P, Gilham F, Spaulding A, Popescu L: Pathway Tools version 13.0: integrated software for pathway/genome informatics and systems biology. Brief Bioinform 2010, 11:40-79. others
  • [31]Moriya Y, Itoh M, Okuda S, Yoshizawa AC, Kanehisa M: KAAS: an automatic genome annotation and pathway reconstruction server. Nucleic Acids Res 2007, 35(Web Server issue):W182-W185.
  • [32]Copeland WB, Bartley BA, Chandran D, Galdzicki M, Kim KH, Sleight SC, Maranas CD, Sauro HM: Computational tools for metabolic engineering. Metab Eng 2012, 14:270-280.
  • [33]Altman T, Travers M, Kothari A, Caspi R, Karp PD: A systematic comparison of the MetaCyc and KEGG pathway databases.BMC Bioinformatics 2013, 14:112.
  • [34]Caspi R, Altman T, Dale JM, Dreher K, Fulcher CA, Gilham F, Kaipa P, Karthikeyan AS, Kothari A, Krummenacker M, Latendresse M, Mueller LA, Paley S, Popescu L, Pujar A, Shearer AG, Zhang P, Karp PD: The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of pathway/genome databases. Nucleic Acids Res 2008, 36:D623-D431.
  • [35]Alexa A, Rahnenführer J, Lengauer T: Improved scoring of functional groups from gene expression data by decorrelating GO graph structure. Bioinformatics 2006, 22:1600-1607.
  • [36]Felsenstein J: Inferring Phylogenies. Volume 2. Sinauer Associates, Sunderland; 2004.
  • [37]Farris JS: The retention index and the rescaled consistency index. Cladistics 1989, 5:417-419.
  • [38]Schliep KP: Phangorn: phylogenetic analysis in R. Bioinformatics 2011, 27:592-593.
  • [39]Swofford DL, Maddison WP: Reconstructing ancestral character states under Wagner parsimony. Math Biosci 1987, 87:199-229.
  • [40]Britton T, Anderson CL, Jacquet D, Lundqvist S, Bremer K: Estimating divergence times in large phylogenetic trees. Syst Biol 2007, 56:741-752.
  • [41]Edgar RC: Search and clustering orders of magnitude faster than BLAST. Bioinformatics 2010, 26:2460-2461.
  • [42]Katoh K, Standley DM: MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol 2013, 30:772-780.
  • [43]Eddy SR: Accelerated profile HMM searches.PLoS Comput Biol 2011, 7:e1002195.
  • [44]Punta M, Coggill PC, Eberhardt RY, Mistry J, Tate J, Boursnell C, Pang N, Forslund K, Ceric G, Clements J: The Pfam protein families database. Nucleic Acids Res 2012, 40:D290-D301. others
  • [45]Tatusov RL, Fedorova ND, Jackson JD, Jacobs AR, Kiryutin B, Koonin EV, Krylov DM, Mazumder R, Mekhedov SL, Nikolskaya AN: The COG database: an updated version includes eukaryotes.BMC Bioinformatics 2003, 4:41. others.
  • [46]Alexa A, Rahnenfuhrer J: topGO: enrichment analysis for gene ontology.R Package version 28 2010.
  • [47][http://cran.r-project.org/web/packages/gplots/index.html] webcite Warnes GR: CRAN - Package gplots. 2014, []
  • [48]Jones DT, Taylor WR, Thornton JM: A model recognition approach to the prediction of all-helical membrane protein structure and topology. Biochemistry 1994, 33:3038-3049.
  • [49]Paradis E, Claude J, Strimmer K: APE: analyses of phylogenetics and evolution in R language. Bioinformatics 2004, 20:289-290.
  文献评价指标  
  下载次数:25次 浏览次数:15次