期刊论文详细信息
BMC Bioinformatics
Phylotastic! Making tree-of-life knowledge accessible, reusable andconvenient
Arlin Stoltzfus22  Hilmar Lapp7  Naim Matasci5  Helena Deus18  Brian Sidlauskas21  Christian M Zmasek16  Gaurav Vaidya10  Enrico Pontelli15  Karen Cranston7  Rutger Vos8  Campbell O Webb2  Luke J Harmon23  Megan Pirrung13  Brian O'Meara17  Matthew W Pennell23  Siavash Mirarab19  Michael S Rosenberg12  James P Balhoff7  Holly M Bik1  Tracy A Heath20  Peter E Midford7  Joseph W Brown23  Emily Jane McTavish6  Jeet Sukumaran11  Mark Westneat4  Michael E Alfaro3  Aaron Steele14  Greg Jordan9 
[1] UC Davis Genome Center, One Shields Ave, Davis, CA, 95618, USA
[2] Arnold Arboretum of Harvard University, Boston, MA, 02130, USA
[3] Department of Ecology and Evolutionary Biology, South University ofCalifornia Los Angeles, 621 Charles E. Young Dr, Los Angeles, CA, 90095,USA
[4] Biodiversity Synthesis Center, Field Museum of Natural History, 1400 SLakeshore Dr, Chicago, IL, 60605, USA
[5] The iPlant Collaborative and EEB Department, University of Arizona, 1657 EHelen St, Tucson, AZ, 85721, USA
[6] University of Texas at Austin, BEACON, Austin, TX, USA
[7] National Evolutionary Synthesis Center, 2024 W. Main St, Durham, NC, 27705,USA
[8] NCB Naturalis, Einsteinweg 2, Leiden, 2333 CC, the Netherlands
[9] Paperpile, 34 Houghton Street, Somerville, MA, 02143, USA
[10] Department of Ecology and Evolutionary Biology, University of ColoradoBoulder, Boulder, CO, 80309-0334, USA
[11] Biology Department, Duke University, Biological Sciences Building, 125Science Drive, Durham, NC, 27708, USA
[12] Center for Evolutionary Medicine and Informatics, The Biodesign Institute,and School of Life Sciences, Arizona State University, PO Box 874501, Tempe,AZ, 85287-4501, USA
[13] University of Colorado Denver Anschutz Medical Campus, Aurora, CO, 80045,USA
[14] U.C. Berkeley Museum of Vertebrate Zoology, University of California, 3101Valley Life Sciences Building, Berkeley, CA, 94720, USA
[15] Department of Computer Science, New Mexico State University, MSC CS, Box30001, Las Cruces, NM, 88003, USA
[16] Sanford-Burnham Medical Research Institute, 10901 North Torrey Pines Road, LaJolla, CA, 92037, USA
[17] Department of Ecology & Evolutionary Biology, 569 Dabney Hall, Universityof Tennessee, Knoxville, TN, 37996, USA
[18] Digital Enterprise Research Institute, National University of Ireland,University Road, Galway, Ireland
[19] Department of Computer Science, University of Texas at Austin, Austin, TX,78701, USA
[20] Department of Integrative Biology, University of California, Berkeley, CA,94720-3140, USA
[21] Department of Fisheries and Wildlife, Oregon State University, 104 Nash Hall,Corvallis, OR, 97331-3803, USA
[22] Institute for Bioscience and Biotechnology Research (IBBR), Biosystems andBiomaterials Division, National Institute of Standards and Technology,Gaithersburg, MD, 20899, USA
[23] Institute for Bioinformatics and Evolutionary Studies (IBEST), University ofIdaho, PO Box 443051, Moscow, ID, 83844-3051, USA
关键词: Tree of life;    Data reuse;    Web services;    Hackathon;    Taxonomy;    Phylogeny;   
Others  :  1087877
DOI  :  10.1186/1471-2105-14-158
 received in 2013-01-18, accepted in 2013-04-30,  发布年份 2013
PDF
【 摘 要 】

Background

Scientists rarely reuse expert knowledge of phylogeny, in spite of years of effort to assemble a great “Tree of Life” (ToL). A notable exception involves the use of Phylomatic, which provides tools to generate custom phylogenies from a large, pre-computed, expert phylogeny of plant taxa. This suggests great potential for a more generalized system that, starting with a query consisting of a list of any known species, would rectify non-standard names, identify expert phylogenies containing the implicated taxa, prune away unneeded parts, and supply branch lengths and annotations, resulting in a custom phylogeny suited to the user’s needs. Such a system could become a sustainable community resource if implemented as a distributed system of loosely coupled parts that interact through clearly defined interfaces.

Results

With the aim of building such a “phylotastic” system, the NESCent Hackathons, Interoperability, Phylogenies (HIP) working group recruited 2 dozen scientist-programmers to a weeklong programming hackathon in June 2012. During the hackathon (and a three-month follow-up period), 5 teams produced designs, implementations, documentation, presentations, and tests including: (1) a generalized scheme for integrating components; (2) proof-of-concept pruners and controllers; (3) a meta-API for taxonomic name resolution services; (4) a system for storing, finding, and retrieving phylogenies using semantic web technologies for data exchange, storage, and querying; (5) an innovative new service, DateLife.org, which synthesizes pre-computed, time-calibrated phylogenies to assign ages to nodes; and (6) demonstration projects. These outcomes are accessible via a public code repository (GitHub.com), a website (http://www.phylotastic.org webcite), and a server image.

Conclusions

Approximately 9 person-months of effort (centered on a software development hackathon) resulted in the design and implementation of proof-of-concept software for 4 core phylotastic components, 3 controllers, and 3 end-user demonstration tools. While these products have substantial limitations, they suggest considerable potential for a distributed system that makes phylogenetic knowledge readily accessible in computable form. Widespread use of phylotastic systems will create an electronic marketplace for sharing phylogenetic knowledge that will spur innovation in other areas of the ToL enterprise, such as annotation of sources and methods and third-party methods of quality assessment.

【 授权许可】

   
2013 Stoltzfus et al.; licensee BioMed Central Ltd.

【 预 览 】
附件列表
Files Size Format View
20150117053058753.pdf 1045KB PDF download
Figure 2. 116KB Image download
Figure 1. 59KB Image download
【 图 表 】

Figure 1.

Figure 2.

【 参考文献 】
  • [1]Cracraft J, Donoghue M, Dragoo J, Hillis D, Yates T (Eds): Assembling the tree of life: harnessing life's history to benefit science and society. Arlington: National Science Foundation; 2002. (accessed 9 May 2013 from http://ucjeps.berkeley.edu/tol.pdf webcite)
  • [2]Felsenstein J: Inferring Phylogenies. Sunderland, Mass: Sinauer; 2004.
  • [3]Kumar S, Dudley J: Bioinformatics software for biologists in the genomics era. Bioinformatics (Oxford, England) 2007, 23(14):1713-1717.
  • [4]Larsen PO, von Ins M: The rate of growth in scientific publication and the decline in coverage provided by Science Citation Index. Scientometrics 2010, 84(3):575-603.
  • [5]Stoltzfus A, O'Meara B, Whitacre J, Mounce R, Gillespie EL, Kumar S, Rosauer DF, Vos RA: Sharing and Re-use of Phylogenetic Trees (and associated data) to Facilitate Synthesis. BMC Res Notes 2012, 5:574. BioMed Central Full Text
  • [6]Sanderson MJ, Donoghue MJ, Piel WH, Eriksson T: TreeBASE: a prototype database of phylogenetic analyses and an interactive tool for browsing the phylogeny of life. Am J Bot 1994, 81(6):183.
  • [7]Piel W, Chan L, Dominus M, Ruan J, Vos R, Tannen V: TreeBASE v. 2: A Database of Phylogenetic Knowledge. London: e-BioSphere; 2009.
  • [8]Webb CO, Donoghue MJ: Phylomatic: tree assembly for applied phylogenetics. Mol Ecol Notes 2005, 5:181-183.
  • [9]The Angiosperm Phylogeny G: An update of the Angiosperm Phylogeny Group classification for the orders and families of flowering plants: APG III. Bot J Linn Soc 2009, 16((2):105-121.
  • [10]Web of Knowledge  http://www.webofknowledge.com webcite
  • [11]Bininda-Emonds OR, Cardillo M, Jones KE, MacPhee RD, Beck RM, Grenyer R, Price SA, Vos RA, Gittleman JL, Purvis A: The delayed rise of present-day mammals. Nature 2007, 446(7135):507-512.
  • [12]Smith SA, Beaulieu JM, Stamatakis A, Donoghue MJ: Understanding angiosperm diversification using small and large phylogenetic trees. Am J Bot 2011, 98(3):404-414.
  • [13]Goloboff PA, Catalano SA, Marcos Mirande J, Szumik CA, Salvador Arias J, Källersjö M, Farris JS: Phylogenetic analysis of 73 060 taxa corroborates major eukaryotic groups. Cladistics 2009, 25(3):211-230.
  • [14]McDonald D, Price MN, Goodrich J, Nawrocki EP, DeSantis TZ, Probst A, Andersen GL, Knight R, Hugenholtz P: An improved Greengenes taxonomy with explicit ranks for ecological and evolutionary analyses of bacteria and archaea. ISME J 2012, 6(3):610-618.
  • [15]Sayers EW, Barrett T, Benson DA, Bolton E, Bryant SH, Canese K, Chetvernin V, Church DM, DiCuccio M, Federhen S: Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 2011, 39(Database issue):D38-D51.
  • [16]Maddison D, Schulz K-S, Maddison W: The Tree of Life Web Project. Zootaxa 2007, 1668:19-40.
  • [17]Cannone J, Subramanian S, Schnare M, Collett J, D'Souza L, Du Y, Feng B, Lin N, Madabusi L, Muller K: The Comparative RNA Web (CRW) Site: an online database of comparative sequence and structure information for ribosomal, intron, and other RNAs. BMC Bioinformatics 2002, 3(1):2. BioMed Central Full Text
  • [18]Heymans M, Singh A: Deriving phylogenetic trees from the similarity analysis of metabolic pathways. Bioinformatics (Oxford, England) 2003, 19(suppl 1):i138-i146.
  • [19]Kummerfeld S, Teichmann S: Relative rates of gene fusion and fission in multi-domain proteins. Trends Genet 2005, 21(1):25-30.
  • [20]Ruan J, Li H, Chen Z, Coghlan A, Coin LJ, Guo Y, Heriche JK, Hu Y, Kristiansen K, Li R: TreeFam: 2008 Update. Nucleic Acids Res 2008, 36(Database issue):D735-D740.
  • [21]Vilella A, Severin J, Ureta-Vidal A, Heng L, Durbin R, Birney E: EnsemblCompara GeneTrees: Complete, duplication-aware phylogenetic trees in vertebrates. Genome Res 2009, 19(2):327-335.
  • [22]Patterson DJ, Cooper J, Kirk PM, Pyle RL, Remsen DP: Names are key to the big new biology. Trends Ecol Evol (Personal edition) 2010, 25(12):686-691.
  • [23]Parr CS, Guralnick R, Cellinese N, Page RD: Evolutionary informatics: unifying knowledge about the diversity of life. Trends Ecol Evol 2012, 27(2):94-103.
  • [24]Page RD: Biodiversity informatics: the challenge of linking data and the role of shared identifiers. Brief Bioinform 2008, 9(5):345-354.
  • [25]Vos RA, Caravas J, Hartmann K, Jensen MA, Miller C: BIO:Phylo-phyloinformatic analysis using perl. BMC Bioinformatics 2011, 12:63. BioMed Central Full Text
  • [26]Sukumaran J, Holder MT: DendroPy: a Python library for phylogenetic computing. Bioinformatics (Oxford, England) 2010, 26(12):1569-1571.
  • [27]Felsenstein J: Phylogenies and the comparative method. Amer Natural 1985, 125:1-15.
  • [28]Pagel M: The Maximum Likelihood Approach to Reconstructing Ancestral Character States of Discrete Characters on Phylogenies. Syst Biol 1999, 48(3):612-622.
  • [29]Pagel M: Detecting correlated evolution on phylogenies: a general method for the comparative analysis of discrete characters. Proc R Soc B 1994, 255:37-45.
  • [30]Stewart CA, Almes GT, Wheeler BC: Cyberinfrastructure Software Sustainability and Reusability: Report from an NSF-funded workshop. Bloomington, IN: Indiana University; 2010.
  • [31]Prlić A, Procter JB: Ten Simple Rules for the Open Development of Scientific Software. PLoS Comput Biol 2012, 8(12):e1002802.
  • [32]Vandervalk BP, McCarthy EL, Wilkinson MD: Moby and Moby 2: creatures of the deep (web). Brief Bioinform 2009, 10(2):114-128.
  • [33]Goecks J, Nekrutenko A, Taylor J: Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol 2010, 11(8):R86. BioMed Central Full Text
  • [34]Hughes-Croucher T, Wilson M: Up and Running with Node.js. In Up and Running. 1st edition. Sebastopol: O'Reilly; 2012:204.
  • [35]Page RD: A Taxonomic Search Engine: federating taxonomic databases using web services. BMC Bioinformatics 2005, 6:48. BioMed Central Full Text
  • [36]Boyle B, Hopkins N, Lu Z, Garay JAR, Mozzherin D, Rees T, Matasci N, Narro ML, Piel WH, Mckay SJ: The taxonomic name resolution service: an online tool for automated standardization of plant names. BMC Bioinformatics 2013, 14:16. BioMed Central Full Text
  • [37]Wilson DE, Reeder DM (Eds): Mammal Species of the World. A Taxonomic and Geographic Reference. 3rd edition. Baltimore: Johns Hopkins University Press; 2005.
  • [38]Leebens-Mack J, Vision T, Brenner E, Bowers J, Cannon S, Clement M, Cunningham C, Depamphilis C, DeSalle R, Doyle J: Taking the first steps towards a standard for reporting on phylogenies: Minimum information about a phylogenetic analysis (MIAPA). Omics-A: J Integr Biol 2006, 10(2):231-237.
  • [39]Berners-Lee T, Hendler J: Publishing on the semantic web. Nature 2001, 410(6832):1023-1024.
  • [40]Klyne G, Carroll JJ: Resource Description Framework (RDF): Concepts and Abstract Syntax. In World Wide Web Consortium. : ; 2004.
  • [41]Virtuoso Universal Server  http://virtuoso.openlinksw.com webcite
  • [42]World Wide Web Consortium: Large Triple Stores.   2011.
  • [43]Prosdocimi F, Chisham B, Pontelli E, Thompson JD, Stoltzfus A: Initial Implementation of a Comparative Data Analysis Ontology. Evol Bioinformatics 2009, 5:47-66.
  • [44]Smith B, Ashburner M, Rosse C, Bard J, Bug W, Ceusters W, Goldberg LJ, Eilbeck K, Ireland A, Mungall CJ: The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration. Nat Biotechnol 2007, 25(11):1251-1255.
  • [45]Lewis PO: NCL: a C++ class library for interpreting data files in NEXUS format. Bioinformatics (Oxford, England) 2003, 19(17):2330-2331.
  • [46]Maddison DR, Swofford DL, Maddison WP: NEXUS: An Extensible File Format for Systematic Information. Syst Biol 1997, 46(4):590-621.
  • [47]Dean J, Ghemawat S: MapReduce: Simplified Data Processing on Large Clusters. In Sixth Symposium on Operating System Design and Implementation. San Francisco, CA: ACM; 2004:107-113.
  • [48]Foundation FS: GNU awk. : ; 2008.
  • [49]Hedges SB, Dudley J, Kumar S: TimeTree: a public knowledge-base of divergence times among organisms. Bioinformatics (Oxford, England) 2006, 22(23):2971-2972.
  • [50]Urbanek S: FastRWeb: Fast Interactive Web Framework for Data Mining Using R. In ISAC 2008 World Congress. : ; 2008.
  • [51]Urbanek S: Rserve - A Fast Way to Provide R Functionality to Applications. In Proceedings of the 3rd International Workshop on Distributed Statistical Computing (DSC 2003). Edited by Hornik K, Leisch F, Zeileis A. : ; 2003.
  • [52]Popescu AA, Huber KT, Paradis E: Ape 3.0: New tools for distance-based phylogenetics and evolutionary analysis in R. Bioinformatics (Oxford, England) 2012, 28(11):1536-1537.
  • [53]Mesquite: a modular system for evolutionary analysis. Version 2.73  http://mesquiteproject.org webcite
  • [54]Doyon JP, Ranwez V, Daubin V, Berry V: Models, algorithms and programs for phylogeny reconciliation. Brief Bioinform 2011, 12(5):392-400.
  • [55]Zmasek CM, Eddy SR: A simple algorithm to infer gene duplication and speciation events on a gene tree. Bioinformatics (Oxford, England) 2001, 17(9):821-828.
  • [56]Han M, Zmasek C: PhyloXML: XML for evolutionary biology and comparative genomics. BMC Bioinformatics 2009, 10:356. BioMed Central Full Text
  • [57]Voss RS, Jansa SA: Phylogenetic relationships and classification of didelphid marsupials, an extant radiation of New World metatherian mammals. Bull Am Mus Nat Hist 2009, 322:1-177.
  • [58]O'Leary MA, Kaufman S: MorphoBank: phylophenomics in the”cloud”. Cladistics 2011, 27:529-537.
  • [59]Riek A: Allometry of milk intake at peak lactation. Mamm Biol 2011, 76(1):3-11.
  • [60]Miller MA, Pfeiffer W, Schwartz T: Creating the CIPRES Science Gateway for inference of large phylogenetic trees. In Gateway Computing Environments Workshop (GCE). La Jolla, CA, USA: San Diego Supercomput. Center; 2010:1-8.
  • [61]Sanderson M, Boss D, Chen D, Cranston K, Wehe A: The PhyLoTA Browser: processing GenBank for molecular phylogenetics research. Syst Biol 2008, 57(3):335-346.
  • [62]Farris SM, Roberts NS: Coevolution of generalist feeding ecologies and gyrencephalic mushroom bodies in insects. Proc Natl Acad Sci U S A 2005, 102(48):17394-17399.
  • [63]Martinson H, Schneider K, Gilbert J, Hines J, Hambäck P, Fagan W: Detritivory: stoichiometry of a neglected trophic level. Ecol Res 2008, 23(3):487-491.
  • [64]Shenoy BD, Jeewon R, Hyde KD: Impact of DNA sequence-data on the taxonomy of anamorphic fungi. Fungal Divers 2007, 26:1-54.
  • [65]Smolenaars MM, Madsen O, Rodenburg KW, Van der Horst DJ: Molecular diversity and evolution of the large lipid transfer protein superfamily. J Lipid Res 2007, 48(3):489-502.
  • [66]Stelkens R, Seehausen O: Genetic distance between species predicts novel trait expression in their hybrids. Evolution 2009, 63(4):884-897.
  • [67]Whitney KD, Garland T Jr: Did genetic drift drive increases in genome complexity? PLoS Genet 2010, 6(8): .
  • [68]Matsen FA, Kodner RB, Armbrust EV: Pplacer: linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree. BMC Bioinformatics 2010, 11:538. BioMed Central Full Text
  文献评价指标  
  下载次数:24次 浏览次数:8次