BMC Research Notes | |
Sharing and re-use of phylogenetic trees (and associated data) to facilitate synthesis | |
Rutger A Vos8  Dan F Rosauer4  Sudhir Kumar3  Emily L Gillespie6  Ross Mounce7  Jamie Whitacre1  Brian O'Meara5  Arlin Stoltzfus2  | |
[1] NMNH, Smithsonian Institution, Washington, DC, 20013-7012, USA;Biochemical Science Division, NIST, 100 Bureau Drive, Gaithersburg, MD, USA;Center for Evolutionary Medicine and Informatics, Biodesign Institute and School of Life Sciences, Arizona State University, Tempe, AZ, 85287-5301, USA;Department of Ecology and Evolutionary Biology, Yale University, New Haven, CT, USA;Department of Ecology & Evolutionary Biology, University of Tennessee, 569 Dabney Hall, Knoxville, TN, 37996-1610, USA;Department of Biology, Marshall University, Huntington, WV, USA;Department of Biology and Biochemistry, University of Bath, Bath, UK;NCB Naturalis, Einsteinweg 2, 2333 CC, Leiden, the Netherlands | |
关键词: Standards; Phyloinformatics; Bioinformatics; Data sharing; Phylogeny; Evolution; | |
Others : 1165442 DOI : 10.1186/1756-0500-5-574 |
|
received in 2012-04-27, accepted in 2012-08-24, 发布年份 2012 | |
【 摘 要 】
Background
Recently, various evolution-related journals adopted policies to encourage or require archiving of phylogenetic trees and associated data. Such attention to practices that promote sharing of data reflects rapidly improving information technology, and rapidly expanding potential to use this technology to aggregate and link data from previously published research. Nevertheless, little is known about current practices, or best practices, for publishing trees and associated data so as to promote re-use.
Findings
Here we summarize results of an ongoing analysis of current practices for archiving phylogenetic trees and associated data, current practices of re-use, and current barriers to re-use. We find that the technical infrastructure is available to support rudimentary archiving, but the frequency of archiving is low. Currently, most phylogenetic knowledge is not easily re-used due to a lack of archiving, lack of awareness of best practices, and lack of community-wide standards for formatting data, naming entities, and annotating data. Most attempts at data re-use seem to end in disappointment. Nevertheless, we find many positive examples of data re-use, particularly those that involve customized species trees generated by grafting to, and pruning from, a much larger tree.
Conclusions
The technologies and practices that facilitate data re-use can catalyze synthetic and integrative research. However, success will require engagement from various stakeholders including individual scientists who produce or consume shareable data, publishers, policy-makers, technology developers and resource-providers. The critical challenges for facilitating re-use of phylogenetic trees and associated data, we suggest, include: a broader commitment to public archiving; more extensive use of globally meaningful identifiers; development of user-friendly technology for annotating, submitting, searching, and retrieving data and their metadata; and development of a minimum reporting standard (MIAPA) indicating which kinds of data and metadata are most important for a re-useable phylogenetic record.
【 授权许可】
2012 Stoltzfus et al.; licensee BioMed Central Ltd.
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
20150416030907786.pdf | 454KB | download | |
Figure 3. | 128KB | Image | download |
Figure 2. | 59KB | Image | download |
Figure 1. | 63KB | Image | download |
【 图 表 】
Figure 1.
Figure 2.
Figure 3.
【 参考文献 】
- [1]Sayers EW, Barrett T, Benson DA, Bolton E, Bryant SH, Canese K, Chetvernin V, Church DM, DiCuccio M, Federhen S, et al.: Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 2011, 39(Database issue):D38-D51.
- [2]Gibney G, Baxevanis AD: Searching NCBI databases using Entrez. Curr Protoc Bioinformatics 2011, Chapter 1:Unit 1.3.
- [3]Sussman JL, Lin D, Jiang J, Manning NO, Prilusky J, Ritter O, Abola EE: Protein Data Bank (PDB): database of three-dimensional structural information of biological macromolecules. Acta Crystallogr D Biol Crystallogr 1998, 54(Pt 6 Pt 1):1078-1084.
- [4]Sidlauskas B, Ganapathy G, Hazkani-Covo E, Jenkins KP, Lapp H, McCall LW, Price S, Scherle R, Spaeth PA, Kidd DM: Linking big: the continuing promise of evolutionary synthesis. Evolution; international journal of organic evolution 2010, 64(4):871-880.
- [5]Hladish T, Gopalan V, Liang C, Qiu W, Yang P, Stoltzfus A: Bio: NEXUS: a Perl API for the NEXUS format for comparative biological data. BMC Bioinformatics 2007, 8:191-201.
- [6]Parr CS, Guralnick R, Cellinese N, Page RD: Evolutionary informatics: unifying knowledge about the diversity of life. Trends Ecol Evol 2012, 27(2):94-103.
- [7]Savage CJ, Vickers AJ: Empirical study of data sharing by authors publishing in PLoS journals. PLoS One 2009, 4(9):e7078.
- [8]Wicherts JM, Borsboom D, Kats J, Molenaar D: The poor availability of psychological research data for reanalysis. Am Psychol 2006, 61(7):726-728.
- [9]Vision T: Open Data and the Social Contract of Scientific Publishing. BioScience 2010, 60(5):330.
- [10]Whitlock MC: Data archiving in ecology and evolution: best practices. Trends Ecol Evol 2011, 26(2):61-65.
- [11]Piel W, Chan L, Dominus M, Ruan J, Vos R, Tannen V: TreeBASE v. 2: A Database of Phylogenetic Knowledge. London: e-BioSphere; 2009.
- [12]O'Leary MAK, Kaufman S: MorphoBank: phylophenomics in the ”cloud”. Cladistics 2011, 27:529-537.
- [13]Maddison DR, Swofford DL, Maddison WP: NEXUS: An Extensible File Format for Systematic Information. Systematic Biology 1997, 46(4):590-621.
- [14]Mesquite: a modular system for evolutionary analysis. Version 2.73. http://mesquiteproject.org webcite
- [15]Leebens-Mack J, Vision T, Brenner E, Bowers J, Cannon S, Clement M, Cunningham C, Depamphilis C, DeSalle R, Doyle J, et al.: Taking the first steps towards a standard for reporting on phylogenies: Minimum information about a phylogenetic analysis (MIAPA). Omics-a Journal of Integrative Biology 2006, 10(2):231-237.
- [16]Hughes J: TreeRipper web application: towards a fully automated optical tree recognition software. BMC Bioinformatics 2011, 12:178.
- [17]Fontaneto D, Jondelius U: Broad taxonomic sampling of mitochondrial cytochrome c oxidase subunit I does not solve the relationships between Rotifera and Acanthocephala. Zoologischer Anzeiger - A Journal of Comparative Zoology 2011, 250(1):80-85.
- [18]Spies CF, Mazzola M, Botha WJ, Van Der Rijst M, Mostert L, McLeod A: Oogonial biometry and phylogenetic analyses of the Pythium vexans species group from woody agricultural hosts in South Africa reveal distinct groups within this taxon. Fungal Biol 2011, 115(2):157-168.
- [19]Velez-Zuazo X, Agnarsson I: Shark tales: a molecular species-level phylogeny of sharks (Selachimorpha, Chondrichthyes). Mol Phylogenet Evol 2011, 58(2):207-217.
- [20]Han M, Zmasek C: phyloXML: XML for evolutionary biology and comparative genomics. BMC Bioinformatics 2009, 10(1 %M doi:10.1186/1471-2105-10-356):356.
- [21]Vos RA, Balhoff JP, Caravas JA, Holder MT, Lapp H, Maddison WP, Midford PE, Priyam A, Sukumaran J, Xia X, et al.: NeXML: Rich, Extensible, and Verifiable Representation of Comparative Data and Metadata. Systematic Biology 2012, 61(4):675-689.
- [22]Deschamps P, Lara E, Marande W, Lopez-Garcia P, Ekelund F, Moreira D: Phylogenomic analysis of kinetoplastids supports that trypanosomatids arose from within bodonids. Molecular biology and evolution 2011, 28(1):53-58.
- [23]Morlon H, Schwilk DW, Bryant JA, Marquet PA, Rebelo AG, Tauss C, Bohannan BJ, Green JL: Spatial patterns of phylogenetic diversity. Ecology letters 2011, 14(2):141-149.
- [24]Yu L, Luan PT, Jin W, Ryder OA, Chemnick LG, Davis HA, Zhang YP: Phylogenetic utility of nuclear introns in interfamilial relationships of Caniformia (order Carnivora). Syst Biol 2011, 60(2):175-187.
- [25]Smith SA, Beaulieu JM, Stamatakis A, Donoghue MJ: Understanding angiosperm diversification using small and large phylogenetic trees. American journal of botany 2011, 98(3):404-414.
- [26]Noor MA, Zimmerman KJ, Teeter KC: Data sharing: how much doesn't get submitted to GenBank? PLoS Biol 2006, 4(7):e228.
- [27]Carew ME, Miller AD, Hoffmann AA: Phylogenetic signals and ecotoxicological responses: potential implications for aquatic biomonitoring. Ecotoxicology 2011, 20(3):595-606.
- [28]Clermont O, Olier M, Hoede C, Diancourt L, Brisse S, Keroudean M, Glodt J, Picard B, Oswald E, Denamur E: Animal and human pathogenic Escherichia coli strains share common genetic backgrounds. Infect Genet Evol 2011, 11(3):654-662.
- [29]Zhang S-B, Ferry Slik JW, Zhang J-L, Cao K-F: Spatial patterns of wood traits in China are controlled by phylogeny and the environment. Global Ecology and Biogeography 2011, 20(2):241-250.
- [30]Cheng J, Gao T, Miao Z, Yanagimoto T: Molecular phylogeny and evolution of Scomber (Teleostei: Scombridae) based on mitochondrial and nuclear DNA sequences. Chinese Journal of Oceanology and Limnology 2011, 29(2):297-310.
- [31]Ratnasingham S, Hebert PD: bold: The Barcode of Life Data System (http://www.barcodinglife.org webcite). Mol Ecol Notes 2007, 7(3):355-364.
- [32]Riek A: Allometry of milk intake at peak lactation. Mammalian Biology Zeitschrift fur Saugetierkunde 2011, 76(1):3-11.
- [33]Whelan S, de Bakker P, Quevillon E, Rodriguez N, Goldman N: PANDIT: an evolution-centric database of protein and associated nucleotide domains with inferred trees. Nucleic Acids Res 2006, 34(suppl 1):D327-D331.
- [34]Finn RD, Mistry J, Tate J, Coggill P, Heger A, Pollington JE, Gavin OL, Gunasekaran P, Ceric G, Forslund K, et al.: The Pfam protein families database. Nucleic Acids Res 2010, 38(Database issue):D211-D222.
- [35]Tatusov RL, Fedorova ND, Jackson JD, Jacobs AR, Kiryutin B, Koonin EV, Krylov DM, Mazumder R, Mekhedov SL, Nikolskaya AN, et al.: The COG database: an updated version includes eukaryotes. BMC Bioinformatics 2003, 4:41.
- [36]Ounap E, Javoid J, Viidalepp J, Tammaru T: Phylogenetic relationships of selected European Ennominae (Lepidoptera: Geometridae). Eur J Entomol 2011, 108(2):267-273.
- [37]Poe S, Giermakowski JT, Latella I, Schaad EW, Hulebak EP, Ryan MJ: Ancient colonization predicts recent naturalization in Anolis lizards. Evolution; international journal of organic evolution 2011, 65(4):1195-1202.
- [38]South A, Stanger-Hall K, Jeng ML, Lewis SM: Correlated evolution of female neoteny and flightlessness with male spermatophore production in fireflies (Coleoptera: Lampyridae). Evolution; international journal of organic evolution 2011, 65(4):1099-1113.
- [39]Humphreys AM, Antonelli A, Pirie MD, Linder HP: Ecology and evolution of the diaspore "burial syndrome". Evolution; international journal of organic evolution 2011, 65(4):1163-1180.
- [40]Rumpler Y, Hauwy M, Fausser JL, Roos C, Zaramody A, Andriaholinirina N, Zinner D: Comparing chromosomal and mitochondrial phylogenies of the Indriidae (Primates, Lemuriformes). Chromosome Res 2011, 19(2):209-224.
- [41]Wang LS, Leebens-Mack J, Kerr Wall P, Beckmann K, de Pamphilis CW, Warnow T: The impact of multiple protein sequence alignment on phylogenetic estimation. IEEE/ACM Trans Comput Biol Bioinform 2011, 8(4):1108-1119.
- [42]Sangaralingam A, Susko E, Bryant D, Spencer M: On the artefactual parasitic eubacteria clan in conditioned logdet phylogenies: heterotachy and ortholog identification artefacts as explanations. BMC Evol Biol 2010, 10:343.
- [43]Wright IJ, Reich PB, Westoby M, Ackerly DD, Baruch Z, Bongers F, Cavender-Bares J, Chapin T, Cornelissen JH, Diemer M, et al.: The worldwide leaf economics spectrum. Nature 2004, 428(6985):821-827.
- [44]Walls RL: Angiosperm leaf vein patterns are linked to leaf functions in a global-scale data set. American journal of botany 2011, 98(2):244-253.
- [45]Webb CO, Donoghue MJ: Phylomatic: tree assembly for applied phylogenetics. Mol Ecol Notes 2005, 5:181-183.
- [46]LdS D: Phylogenetic habitat filtering influences forest nucleation in grasslands. Oikos 2011, 120(2):208-215.
- [47]Davies TJ, Barraclough TG, Chase MW, Soltis PS, Soltis DE, Savolainen V: Darwin's abominable mystery: Insights from a supertree of the angiosperms. Proc Natl Acad Sci U S A 2004, 101(7):1904-1909.
- [48]Burns JH, Strauss SY: More closely related species are more ecologically similar in an experimental test. Proc Natl Acad Sci U S A 2011, 108(13):5302-5307.
- [49]Bininda-Emonds OR, Cardillo M, Jones KE, MacPhee RD, Beck RM, Grenyer R, Price SA, Vos RA, Gittleman JL, Purvis A: The delayed rise of present-day mammals. Nature 2007, 446(7135):507-512.
- [50]Maddison D, Schulz K-S, Maddison W: The Tree of Life Web Project. Zootaxa 2007, 1668(Linnaeus Tercentenary: Progress in Invertebrate Taxonomy):19-40.
- [51]Hedges SB, Dudley J, Kumar S: TimeTree: a public knowledge-base of divergence times among organisms. Bioinformatics 2006, 22(23):2971-2972.
- [52]Altenhoff A, Dessimoz C: Phylogenetic and Functional Assessment of Orthologs Inference Projects and Methods. PLoS Comput Biol 2009, 5(1):e1000262.
- [53]Cannone J, Subramanian S, Schnare M, Collett J, D'Souza L, Du Y, Feng B, Lin N, Madabusi L, Muller K, et al.: The Comparative RNA Web (CRW) Site: an online database of comparative sequence and structure information for ribosomal, intron, and other RNAs. BMC Bioinformatics 2002, 3(1):2.
- [54]Dufayard J-F, Duret L, Penel S, Gouy M, Rechenmann F, Perrière G: Tree pattern matching in phylogenetic trees: automatic search for orthologs or paralogs in homologous gene sequence databases. Bioinformatics 2005, 21(11):2596-2603.
- [55]Heymans M, Singh A: Deriving phylogenetic trees from the similarity analysis of metabolic pathways. Bioinformatics 2003, 19(suppl 1):i138-i146.
- [56]Hughes T, Hyun Y, Liberles D: Visualising very large phylogenetic trees in three dimensional hyperbolic space. BMC Bioinformatics 2004, 5(1):48.
- [57]Kummerfeld S, Teichmann S: Relative rates of gene fusion and fission in multi-domain proteins. Trends in Genetics 2005, 21(1):25-30.
- [58]Roth C, Betts M, Steffansson P, Saelensminde G, Liberles D: The Adaptive Evolution Database (TAED): a phylogeny based tool for comparative genomics. Nucleic Acids Res 2005, 33(Database issue):D495.
- [59]Ruan J, Li H, Chen Z, Coghlan A, Coin LJ, Guo Y, Hèrichè J-K, Hu Y, Kristiansen K, Li R, et al.: TreeFam: 2008 Update. Nucleic Acids Res 2008, 36(Database issue):gkm1005.
- [60]Schomburg I, Chang A, Ebeling C, Gremse M, Heldt C, Huhn G, Schomburg D: BRENDA, the enzyme database: updates and major new developments. Nucleic Acids Res 2004, 32(Database issue):D431-D433.
- [61]Vilella A, Severin J, Ureta-Vidal A, Heng L, Durbin R, Birney E: EnsemblCompara GeneTrees: Complete, duplication-aware phylogenetic trees in vertebrates. Genome Res 2009, 19(2):327-335.
- [62]Wright JJ: Conservative coevolution of Mullerian mimicry in a group of rift lake catfish. Evolution; international journal of organic evolution 2011, 65(2):395-407.
- [63]Day JJ, Bills R, Friel JP: Lacustrine radiations in African Synodontis catfish. J Evol Biol 2009, 22(4):805-817.
- [64]Burleigh JG, Bansal MS, Eulenstein O, Hartmann S, Wehe A, Vision TJ: Genome-Scale Phylogenetics: Inferring the Plant Tree of Life from 18,896 Gene Trees. Syst Biol 2011, 60(2):117-125.
- [65]Van Wilgen NJ, Richardson DM: Is phylogenetic relatedness to native species important for the establishment of reptiles introduced to California and Florida? Diversity and Distributions 2011, 17(1):172-181.
- [66]Anderson NR, Tarczy-Hornoch P, Bumgarner RE: On the persistence of supplementary resources in biomedical publications. BMC Bioinformatics 2006, 7:260.
- [67]Laubach T, von Haeseler A: TreeSnatcher: coding trees from images. Bioinformatics 2007, 23(24):3384-3385.
- [68]TreeThief: a tool for manual phylogenetic tree entry http://evolve.zoo.ox.ac.uk/software/TreeThief/main.html webcite
- [69]Piwowar HA, Vision TJ, Whitlock MC: Data archiving is a good investment. Nature 2011, 473(7347):285.
- [70]Shotton D, Portwin K, Klyne G, Miles A: Adventures in semantic publishing: exemplar semantic enhancements of a research article. PLoS computational biology 2009, 5(4):e1000361.
- [71]Penev L, Agosti D, Georgiev T, Catapano T, Miller J, Blagoderov V, Roberts D, Smith VS, Brake I, Ryrcroft S, et al.: Semantic tagging of and semantic enhancements to systematics papers: ZooKeys working examples. Zookeys 2010, 50:1-16.
- [72]Page RD: Biodiversity informatics: the challenge of linking data and the role of shared identifiers. Briefings in bioinformatics 2008, 9(5):345-354.
- [73]Patterson DJ, Cooper J, Kirk PM, Pyle RL, Remsen DP: Names are key to the big new biology. Trends Ecol Evol 2010, 25(12):686-691.
- [74]Hawksworth DL: A new dawn for the naming of fungi: impacts of decisions made in Melbourne in July 2011 on the future publication and regulation of fungal names. IMA Fungus 2011, 2(2):155-162.
- [75]Bjarnason A, Chamberlain AT, Lockwood CA: A methodological investigation of hominoid craniodental morphology and phylogenetics. J Hum Evol 2011, 60(1):47-57.
- [76]Forest F, Grenyer R, Rouget M, Davies TJ, Cowling RM, Faith DP, Balmford A, Manning JC, Proches S, van der Bank M, et al.: Preserving the evolutionary potential of floras in biodiversity hotspots. Nature 2007, 445(7129):757-760.
- [77]Clark T, Martin S, Liefeld T: Globally distributed object identification for biological knowledgebases. Briefings in bioinformatics 2004, 5(1):59-70.