期刊论文详细信息
BMC Bioinformatics
Osiris: accessible and reproducible phylogenetic and phylogenomic analyses within the Galaxy workflow management system
Karl B Lopker1  William Chen1  Celia K C Churchill1  M Sabrina Pankey1  Roger Ngo1  Markos A Alexandrou1  Todd H Oakley1 
[1]Ecology, Evolution, and Marine Biology, University of California-Santa Barbara, Santa Barbara, CA 93106, USA
关键词: Tree estimation;    Sequence alignment;    Next-generation sequence analysis;    Assembly;    Orthology;    Galaxy;    Phylogenetics;    Phylogenomics;   
Others  :  1087560
DOI  :  10.1186/1471-2105-15-230
 received in 2013-12-01, accepted in 2014-04-29,  发布年份 2014
PDF
【 摘 要 】

Background

Phylogenetic tools and ‘tree-thinking’ approaches increasingly permeate all biological research. At the same time, phylogenetic data sets are expanding at breakneck pace, facilitated by increasingly economical sequencing technologies. Therefore, there is an urgent need for accessible, modular, and sharable tools for phylogenetic analysis.

Results

We developed a suite of wrappers for new and existing phylogenetics tools for the Galaxy workflow management system that we call Osiris. Osiris and Galaxy provide a sharable, standardized, modular user interface, and the ability to easily create complex workflows using a graphical interface. Osiris enables all aspects of phylogenetic analysis within Galaxy, including de novo assembly of high throughput sequencing reads, ortholog identification, multiple sequence alignment, concatenation, phylogenetic tree estimation, and post-tree comparative analysis. The open source files are available on in the Bitbucket public repository and many of the tools are demonstrated on a public web server (http://galaxy-dev.cnsi.ucsb.edu/osiris/ webcite).

Conclusions

Osiris can serve as a foundation for other phylogenomic and phylogenetic tool development within the Galaxy platform.

【 授权许可】

   
2014 Oakley et al.; licensee BioMed Central Ltd.

【 预 览 】
附件列表
Files Size Format View
20150117020259299.pdf 1017KB PDF download
Figure 3. 134KB Image download
Figure 2. 40KB Image download
Figure 1. 115KB Image download
【 图 表 】

Figure 1.

Figure 2.

Figure 3.

【 参考文献 】
  • [1]Drummond AJ, Suchard MA, Xie D, Rambaut A: Bayesian phylogenetics with BEAUti and the BEAST 1.7. Mol Biol Evol 2012, 29:1969-1973.
  • [2]Han MV, Zmasek CM: phyloXML: XML for evolutionary biology and comparative genomics. BMC Bionf 2009, 10:356.
  • [3]Vos RA, Balhoff JP, Caravas JA, Holder MT, Lapp H, Maddison WP, Midford PE, Priyam A, Sukumaran J, Xia XH, Stoltzfus A: NeXML: rich, extensible, and verifiable representation of comparative data and metadata. Syst Biol 2012, 61(4):675-689.
  • [4]Tamura K, Peterson D, Peterson N, Stecher G, Nei M, Kumar S: MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. Mol Biol Evol 2011, 28(10):2731-2739.
  • [5]Kearse M, Moir R, Wilson A, Stones-Havas S, Cheung M, Sturrock S, Buxton A, Cooper A, Markowitz S, Duran C, Thierer T, Ashton B, Meintjes P, Drummond A: Geneious Basic: an integrated and extendable desktop software platform for the organization and analysis of sequence data. Bioinformatics 28(12):1647-1649.
  • [6]Ludascher B, Altintas I, Berkley C, Higgins D, Jaeger E, Jones M, Lee EA, Tao J, Zhao Y: Scientific workflow management and the Kepler system. Concurr Comp-Pract E 2006, 18(10):1039-1065.
  • [7]Oinn T, Addis M, Ferris J, Marvin D, Senger M, Greenwood M, Carver T, Glover K, Pocock MR, Wipat A: Taverna: a tool for the composition and enactment of bioinformatics workflows. Bioinformatics 2004, 20(17):3045-3054.
  • [8]Abouelhoda M, Issa SA, Ghanem M: Tavaxy: Integrating Taverna and Galaxy workflows with cloud computing support. BMC Bionf 2012, 13(1):77.
  • [9]Giardine B, Riemer C, Hardison RC, Burhans R, Elnitski L, Shah P, Zhang Y, Blankenberg D, Albert I, Taylor J, Miller W, Kent W, Nekrutenko A: Galaxy: A platform for interactive large-scale genome analysis. Genome Res 2005, 15(10):1451-1455.
  • [10]Lord E, Leclercq M, Boc A, Diallo AB, Makarenkov V: Armadillo 1.1: An Original Workflow Platform for Designing and Conducting Phylogenetic Analysis and Simulations. Plos One 2012, 7(1):e29903.
  • [11]Maddison WP, Maddison DR: Interactive analysis of phylogeny and character evolution using the computer program MacClade. Folia Primatol (Basel) 1989, 53:190-202.
  • [12]Maddison WP, Maddison DR: Mesquite: a modular system for evolutionary analysis. 274th edition. 2010.
  • [13]Sakarya O, Kosik KS, Oakley TH: Reconstructing ancestral genome content based on symmetrical best alignments and Dollo parsimony. Bioinformatics 2008, 24(5):606-612.
  • [14]Ebersberger I, Strauss S, von Haeseler A: HaMStR: Profile hidden markov model based search for orthologs in ESTs. BMC Evol Biol 2009, 9:157.
  • [15]Sonnhammer ELL, Eddy SR, Birney E, Bateman A, Durbin R: Pfam: multiple sequence alignments and HMM-profiles of protein domains. Nucleic Acids Res 1998, 26(1):320-322.
  • [16]Finn RD, Clements J, Eddy SR: HMMER web server: interactive sequence similarity searching. Nucleic Acids Res 2011, 39:W29-W37.
  • [17]Edgar RC: MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bionf 2004, 5:1-19.
  • [18]Loytynoja A, Goldman N: A model of evolution and structure for multiple sequence alignment. Philos T R Soc B 2008, 363(1512):3913-3919.
  • [19]Brown NP, Leroy C, Sander C: MView: a web-compatible database search or multiple alignment viewer. Bioinformatics 1998, 14(4):380-381.
  • [20]Katoh K, Misawa K, Kuma K, Miyata T: MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res 2002, 30(14):3059-3066.
  • [21]Misof B, Misof K: A Monte Carlo Approach Successfully Identifies Randomness in Multiple Sequence Alignments : A More Objective Means of Data Exclusion. Syst Biol 2009, 58(1):21-34.
  • [22]Talavera G, Castresana J: Improvement of phylogenies after removing divergent and ambiguously aligned blocks from protein sequence alignments. Syst Biol 2007, 56(4):564-577.
  • [23]Kuck P, Meusemann K: FASconCAT: Convenient handling of data matrices. Mol Phylogenet Evol 2010, 56(3):1115-1118.
  • [24]Smith SA, Dunn CW: Phyutility: a phyloinformatics tool for trees, alignments and molecular data. Bioinformatics 2008, 24(5):715-716.
  • [25]Stamatakis A: RAxML-VI-HPC: Maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics 2006, 22(21):2688-2690.
  • [26]Berger SA, Krompass D, Stamatakis A: Performance, accuracy, and web server for evolutionary placement of short sequence reads under maximum likelihood. Syst Biol 2011, 60(3):291-302.
  • [27]Liu L, Yu LL: "Estimating Species Trees from Unrooted Gene Trees". Systematic Biology 2011, 60(5):661-667.
  • [28]Evans J, Sheneman L, Foster J: Relaxed neighbor joining: a fast distance-based phylogenetic tree construction method. J Mol Evol 2006, 62(6):785-792.
  • [29]Abascal F, Zardoya R, Posada D: ProtTest: selection of best-fit models of protein evolution. Bioinformatics 2005, 21(9):2104-2105.
  • [30]Posada D: jModelTest: Phylogenetic model averaging. Mol Biol Evol 2008, 25(7):1253-1256.
  • [31]Webb CO, Donoghue MJ: Phylomatic: tree assembly for applied phylogenetics. Mol Ecol Notes 2005, 5(1):181-183.
  • [32]Oakley TH, Wolfe JM, Lindgren AR, Zaharoff AK: Phylotranscriptomics to bring the understudied into the fold: monophyletic ostracoda, fossil placement, and pancrustacean phylogeny. Mol Biol Evol 2013, 30(1):215-233.
  • [33]Liu L, Pearl DK: Species trees from gene trees: Reconstructing Bayesian posterior distributions of a species phylogeny using estimated gene tree distributions. Syst Biol 2007, 56(3):504-514.
  • [34]Edwards SV, Liu L, Pearl DK: High-resolution species trees without concatenation. P Natl Acad Sci USA 2007, 104(14):5936-5941.
  • [35]Kubatko LS, Carstens BC, Knowles LL: STEM: species tree estimation using maximum likelihood for gene trees under coalescence. Bioinformatics 2009, 25(7):971-973.
  • [36]Liu L, Yu LL, Kubatko L, Pearl DK, Edwards SV: Coalescent methods for estimating phylogenetic trees. Mol Phylogenet Evol 2009, 53(1):320-328.
  • [37]Darriba D, Taboada GL, Doallo R, Posada D: ProtTest 3: fast selection of best-fit models of protein evolution. Bioinformatics 2011, 27(8):1164-1165.
  • [38]Nardi F, Spinsanti G, Boore JL, Carapelli A, Dallai R, Frati F: Hexapod origins: monophyletic or paraphyletic? Science 2003, 299:1887-1889.
  • [39]Letunic I, Bork P: Interactive Tree Of Life (iTOL): an online tool for phylogenetic tree display and annotation. Bioinformatics 2007, 23(1):127-128.
  • [40]Shimodaira H, Hasegawa M: Multiple comparisons of log-likelihoods with applications to phylogenetic inference. Mol Biol Evol 1999, 16(8):1114-1116.
  • [41]Faith DP: Conservation evaluation and phylogenetic diversity. Biol Conserv 1992, 61(1):1-10.
  • [42]MacArthur DG, Balasubramanian S, Frankish A, Huang N, Morris J, Walter K, Jostins L, Habegger L, Pickrell JK, Montgomery SB, Albers CA, Zhang ZD, Conrad DF, Lunter G, Zheng H, Ayub Q, DePristo MA, Banks E, Hu M, Handsaker RE, Rosenfeld JA, Fromer M, Jin M, Mu XJ, Khurana E, Ye K, Kay M, Saunders GI, Suner M0M, Hunt T, et al.: A systematic survey of loss-of-function variants in human protein-coding genes. Science 2012, 335(6070):823-828.
  • [43]Afgan E, Baker D, Coraor N, Chapman B, Nekrutenko A, Taylor J: Galaxy CloudMan: delivering cloud compute clusters. BMC Bionf 2010, 11(Suppl 12):S4.
  • [44]Afgan E, Baker D, Coraor N, Goto H, Paul I, Makova K, Nekrutenko A, Taylor J: Harnessing cloud computing with Galaxy Cloud. Nature Biotech 2011, 29:972-974.
  文献评价指标  
  下载次数:35次 浏览次数:24次