| BMC Bioinformatics | |
| πBUSS: a parallel BEAST/BEAGLE utility for sequence simulation under complex evolutionary scenarios | |
| Filip Bielejec1  Philippe Lemey1  Luiz Max Carvalho3  Guy Baele1  Andrew Rambaut2  Marc A Suchard4  | |
| [1] Department of Microbiology and Immunology, Rega Institute, KU Leuven, Leuven, Belgium | |
| [2] Institute of Evolutionary Biology, University of Edinburgh, Edinburgh, UK | |
| [3] Program for Scientific Computing (PROCC), Fundação Oswaldo Cruz, Rio de Janeiro, Brazil | |
| [4] Department of Biostatistics, UCLA Fielding School of Public Health, University of California, Los Angeles, CA, 90095, USA | |
| 关键词: Evolution; BEAGLE; BEAST; Phylogenetics; Monte Carlo; Simulation; | |
| Others : 818613 DOI : 10.1186/1471-2105-15-133 |
|
| received in 2013-10-10, accepted in 2014-04-24, 发布年份 2014 | |
PDF
|
|
【 摘 要 】
Background
Simulated nucleotide or amino acid sequences are frequently used to assess the performance of phylogenetic reconstruction methods. BEAST, a Bayesian statistical framework that focuses on reconstructing time-calibrated molecular evolutionary processes, supports a wide array of evolutionary models, but lacked matching machinery for simulation of character evolution along phylogenies.
Results
We present a flexible Monte Carlo simulation tool, called πBUSS, that employs the BEAGLE high performance library for phylogenetic computations to rapidly generate large sequence alignments under complex evolutionary models. πBUSS sports a user-friendly graphical user interface (GUI) that allows combining a rich array of models across an arbitrary number of partitions. A command-line interface mirrors the options available through the GUI and facilitates scripting in large-scale simulation studies. πBUSS may serve as an easy-to-use, standard sequence simulation tool, but the available models and data types are particularly useful to assess the performance of complex BEAST inferences. The connection with BEAST is further strengthened through the use of a common extensible markup language (XML), allowing to specify also more advanced evolutionary models. To support simulation under the latter, as well as to support simulation and analysis in a single run, we also add the πBUSS core simulation routine to the list of BEAST XML parsers.
Conclusions
πBUSS offers a unique combination of flexibility and ease-of-use for sequence simulation under realistic evolutionary scenarios. Through different interfaces, πBUSS supports simulation studies ranging from modest endeavors for illustrative purposes to complex and large-scale assessments of evolutionary inference procedures. Applications are not restricted to the BEAST framework, or even time-measured evolutionary histories, and πBUSS can be connected to various other programs using standard input and output format.
【 授权许可】
2014 Bielejec et al.; licensee BioMed Central Ltd.
【 预 览 】
| Files | Size | Format | View |
|---|---|---|---|
| 20140711123026547.pdf | 625KB | ||
| Figure 2. | 34KB | Image | |
| Figure 1. | 54KB | Image |
【 图 表 】
Figure 1.
Figure 2.
【 参考文献 】
- [1]Arenas M: Simulation of molecular data under diverse evolutionary scenarios. PLoS Comput Biol 2012, 8(5):e1002495.
- [2]Hoban S, Bertorelle G, Gaggiotti OE: Computer simulations: tools for population and evolutionary genetics. Nat Rev Genet 2011, 13(2):110-122.
- [3]Stamatakis A: An efficient program for phylogenetic inference using simulated annealing. In Parallel and Distributed Processing Symposium, 2005. Proceedings. 19th IEEE International. New York, USA: IEEE; 2005.
- [4]Blanchette M, Diallo AB, Green ED, Miller W, Haussler D: Computational reconstruction of ancestral DNA sequences. Methods Mol Biol 2008, 422:171-184.
- [5]Brown JM, ElDabaje R: PuMA: Bayesian analysis of partitioned (and unpartitioned) model adequacy. Bioinformatics 2009, 25(4):537-538.
- [6]Goldman N: Statistical tests of models of DNA substitution. J Mol Evol 1993, 36(2):182-198.
- [7]Rambaut A, Grass NC: Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees. Comput Appl Biosci 1997, 13(3):235-238.
- [8]Yang Z: PAML 4: Phylogenetic Analysis by Maximum Likelihood. Mol Biol Evol 2007, 24(8):1586-1591.
- [9]Kosakovsky Pond SL, Frost SDW, Muse SV: HyPhy: hypothesis testing using phylogenies. Bioinformatics 2005, 21(5):676-679.
- [10]Zuckerkandl E, Pauling LB: Molecular Disease, Evolution, and Genetic Heterogeneity. New York: Academic Press; 1962.
- [11]Arbogast BS, Edwards SV, Wakeley J, Beerli P, Slowinski JB: Estimating divergence times from molecular data on phylogenetic and population genetic timescales. Annu Rev Ecol Evol Systemat 2002, 33:707-740.
- [12]Drummond AJ, Suchard MA, Xie D, Rambaut A: Bayesian phylogenetics with BEAUti and the BEAST 1.7. Mol Biol Evol 2012, 29(8):1969-1973.
- [13]Drummond AJ, Ho SYW, Phillips MJ, Rambaut A: Relaxed phylogenetics and dating with confidence. PLoS Biol 2006, 4(5):e88.
- [14]Drummond A, Suchard M: Bayesian random local clocks, or one rate to rule them all. BMC Biol 2010, 8:114. BioMed Central Full Text
- [15]Lemey P, Rambaut A, Drummond AJ, Suchard MA: Bayesian Phylogeography Finds Its Roots. PLoS Comput Biol 2009, 5(9):e1000520.
- [16]Ayres DL, Darling A, Zwickl DJ, Beerli P, Holder MT, Lewis PO, Huelsenbeck JP, Ronquist F, Swofford DL, Cummings MP, Rambaut A, Suchard MA: BEAGLE: an application programming interface and high-performance computing library for statistical phylogenetics. Syst Biol 2012, 61:170-173.
- [17]Moler C, Loan CV: Nineteen dubious ways to compute the exponential of a matrix. SIAM Rev 1978, 20:801-836.
- [18]Hasegawa M, Kishino H, Yano Ta: Dating of the human-ape splitting by a molecular clock of mitochondrial DNA. J Mol Evol 1985, 22:160-174.
- [19]Tamura K, Nei M: Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. Mol Biol Evol 1993, 10(3):512-526.
- [20]Tavaré S: Some probabilistic and statistical problems in the analysis of DNA sequences. Lect Math Life Sci (American Mathematical Society) 1986, 17:57-86.
- [21]Goldman N, Yang Z: A codon-based model of nucleotide substitution for protein-coding DNA sequences. Mol Biol Evol 1994, 11(5):725-736.
- [22]Muse SV, Gaut BS: A likelihood approach for comparing synonymous and nonsynonymous nucleotide substitution rates, with application to the chloroplast genome. Mol Biol Evol 1994, 11(5):715-724.
- [23]Dayhoff MO, Schwartz RM: A model of evolutionary change in proteins. In Atlas of Protein Sequence and Structure. Washington, D.C., USA: Citeseer, National Biomedical Research Foundation; 1978.
- [24]Jones DT, Taylor WR, Thornton JM: The rapid generation of mutation data matrices from protein sequences. Comput Appl Biosci 1992, 8(3):275-282.
- [25]Henikoff S, Henikoff JG: Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci 1992, 89(22):10915-10919.
- [26]Whelan S, Goldman N: A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach. Mol Biol Evol 2001, 18(5):691-699.
- [27]Le SQ, Gascuel O: An improved general amino acid replacement matrix. Mol Biol Evol 2008, 25(7):1307-1320.
- [28]Yang Z: Among-site rate variation and its impact on phylogenetic analyses. Trends Ecol Evol 1996, 11(9):367-372.
- [29]Gu X, Fu YX, Li WH: Maximum likelihood estimation of the heterogeneity of substitution rate among nucleotide sites. Mol Biol Evol 1995, 12(4):546-557.
- [30]Edwards CJ, Suchard MA, Lemey P, Welch JJ, Barnes I, Fulton TL, Barnett R, O’Connell TC, Coxon P, Monaghan N, Valdiosera CE, Lorenzen ED, Willerslev E, Baryshnikov GF, Rambaut A, Thomas MG, Bradley DG, Shapiro B: Ancient hybridization and an Irish origin for the modern polar bear matriline. Curr Biol 2011, 21:1251-1258.
- [31]Bielejec F, Lemey P, Baele G, Rambaut A, Suchard MA: Inferring heterogeneous evolutionary processes through time: from sequence substitution to phylogeography. Syst Biol 2014. [http://sysbio.oxfordjournals.org/content/early/2014/04/21/sysbio.syu015 webcite]
- [32]Strope CL, Abel K, Scott SD, Moriyama EN: Biological sequence simulation for testing complex evolutionary hypotheses: indel-Seq-Gen version 2.0. Mol Biol Evol 2009, 26(11):2581-2593.
- [33]Sipos B, Massingham T, Jordan G, Goldman N: PhyloSim - Monte Carlo simulation of sequence evolution in the R statistical computing environment. BMC Bioinformatics 2011, 12:104. [http://www.biomedcentral.com/1471-2105/12/104 webcite] BioMed Central Full Text
- [34]Arenas M, Posada D: Recodon: coalescent simulation of coding DNA sequences with recombination, migration and demography. BMC Bioinformatics 2007, 8:458. BioMed Central Full Text
- [35]Arenas M, Posada D: Coalescent simulation of intracodon recombination. Genetics 2010, 184(2):429-437.
- [36]Fletcher W, Yang Z: INDELible: a flexible simulator of biological sequence evolution. Mol Biol Evol 2009, 26(8):1879-1888.
- [37]Cartwright RA: DNA assembly with gaps (Dawg): simulating sequence evolution. Bioinformatics 2005, 21(Suppl 3):i31-i38.
- [38]Maddison WP, Maddison D: Mesquite: a modular system for evolutionary analysis. 2011. [http://mesquiteproject.org webcite]
- [39]Stoye J, Evers D, Meyer F: Rose: generating sequence families. Bioinformatics 1998, 14(2):157-163.
- [40]Arenas M, Dos Santos HG, Posada D, Bastolla U: Protein evolution along phylogenetic histories under structurally constrained substitution models. Bioinformatics 2013, 29(23):3020-3028.
- [41]Dalquen DA, Anisimova M, Gonnet GH, Dessimoz C: ALF–a simulation framework for genome evolution. Mol Biol Evol 2012, 29(4):1115-1123.
- [42]Carvajal-Rodriguez A: GENOMEPOP: a program to simulate genomes in populations. BMC Bioinformatics 2008, 9:223. BioMed Central Full Text
- [43]Excoffier L, Novembre J, Schneider S: SIMCOAL: a general coalescent program for the simulation of molecular data in interconnected populations with arbitrary demography. J Hered 2000, 91(6):506-509.
- [44]Pang A, Smith AD, Nuin PA, Tillier ER: SIMPROT: using an empirically determined indel distribution in simulations of protein evolution. BMC Bioinformatics 2005, 6:236. BioMed Central Full Text
- [45]Adachi J, Waddell PJ, Martin W, Hasegawa M: Plastid genome phylogeny and a model of amino acid substitution for proteins encoded by chloroplast DNA. J Mol Evol 2000, 50(4):348-358.
- [46]Dang C, Le Q, Gascuel O, Le V: FLU, an amino acid substitution model for influenza proteins. BMC Evol Biol 2010, 10:99. [http://www.biomedcentral.com/1471-2148/10/99 webcite] BioMed Central Full Text
- [47]Adachi J, Hasegawa M: Model of amino acid substitution in proteins encoded by mitochondrial DNA. J Mol Evol 1996, 42(4):459-468.
- [48]Dayhoff M, Eck R, (US) NBRF: Atlas of Protein Sequence and Structure 1965. t. 1, National Biomedical Research Foundation 1965. [http://books.google.be/books?id=9Hp5nAEACAAJ webcite]
- [49]Abascal F, Posada D, Zardoya R: MtArt: a new model of amino acid replacement for Arthropoda. Mol Biol Evol 2007, 24:1-5.
- [50]Cao Y, Janke A, Waddell PJ, Westerman M, Takenaka O, Murata S, Okada N, Paabo S, Hasegawa M: Conflict among individual mitochondrial proteins in resolving the phylogeny of eutherian orders. J Mol Evol 1998, 47(3):307-322.
- [51]Adachi J, Hasegawa M: MOLPHY Version 2.3: Programs for Molecular Phylogenetics Based on Maximum Likelihood. Tokyo, Japan: Computer science monographs 28, Institute of Statistical mathematics Tokyo; 1996.
- [52]Rota-Stabelli O, Yang Z, Telford MJ: MtZoa: a general mitochondrial amino acid substitutions model for animal evolutionary studies. Mol Phylogenet Evol 2009, 52:268-272.
- [53]Muller T, Vingron M: Modeling amino acid replacement. J Comput Biol 2000, 7(6):761-776.
- [54]Dimmic MW, Rest JS, Mindell DP, Goldstein RA: rtREV: an amino acid substitution matrix for inference of retrovirus and reverse transcriptase phylogeny. J Mol Evol 2002, 55:65-73.
- [55]Nickle DC, Heath L, Jensen MA, Gilbert PB, Mullins JI, Kosakovsky Pond SL: HIV-specific probabilistic models of protein evolution. PLoS ONE 2007, 2(6):e503.
- [56]Felsenstein J: Evolutionary trees from DNA sequences: A maximum likelihood approach. J Mol Evol 1981, 17:368-376.
- [57]Wertheim JO, Kosakovsky Pond SL: Purifying selection can obscure the ancient age of viral lineages. Mol Biol Evol 2011, 28(12):3355-3365.
- [58]Lemey P, Rambaut A, Welch JJ, Suchard MA: Phylogeography takes a relaxed random walk in continuous space and time. Mol Biol Evol 2010, 27(8):1877-85.
PDF