期刊论文详细信息
BMC Evolutionary Biology
Selecting optimal partitioning schemes for phylogenomic datasets
Alexandros Stamatakis1  Christoph Mayer2  David Kainer4  Brett Calcott5  Robert Lanfear3 
[1]Karlsruhe Institute of Technology, Institute for Theoretical Informatics, Postfach 6980, 76128 Karlsruhe, Germany
[2]Zoologisches Forschungsmuseum Alexander Koenig, Bonn, Germany
[3]National Evolutionary Synthesis Center, Durham, NC, USA
[4]Ecology Evolution and Genetics, Research School of Biology, Australian National University, Canberra, ACT, Australia
[5]Philosophy Program, Research School of Social Sciences, Australian National University, Canberra, ACT, Australia
关键词: Hierarchical clustering;    Clustering;    Phylogenomics;    Phylogenetics;    AIC;    AICc;    BIC;    Partitionfinder;    Partitioning;    Model selection;   
Others  :  856743
DOI  :  10.1186/1471-2148-14-82
 received in 2013-11-19, accepted in 2014-04-03,  发布年份 2014
PDF
【 摘 要 】

Background

Partitioning involves estimating independent models of molecular evolution for different subsets of sites in a sequence alignment, and has been shown to improve phylogenetic inference. Current methods for estimating best-fit partitioning schemes, however, are only computationally feasible with datasets of fewer than 100 loci. This is a problem because datasets with thousands of loci are increasingly common in phylogenetics.

Methods

We develop two novel methods for estimating best-fit partitioning schemes on large phylogenomic datasets: strict and relaxed hierarchical clustering. These methods use information from the underlying data to cluster together similar subsets of sites in an alignment, and build on clustering approaches that have been proposed elsewhere.

Results

We compare the performance of our methods to each other, and to existing methods for selecting partitioning schemes. We demonstrate that while strict hierarchical clustering has the best computational efficiency on very large datasets, relaxed hierarchical clustering provides scalable efficiency and returns dramatically better partitioning schemes as assessed by common criteria such as AICc and BIC scores.

Conclusions

These two methods provide the best current approaches to inferring partitioning schemes for very large datasets. We provide free open-source implementations of the methods in the PartitionFinder software. We hope that the use of these methods will help to improve the inferences made from large phylogenomic datasets.

【 授权许可】

   
2014 Lanfear et al.; licensee BioMed Central Ltd.

【 预 览 】
附件列表
Files Size Format View
20140723040037911.pdf 901KB PDF download
22KB Image download
28KB Image download
85KB Image download
45KB Image download
【 图 表 】

【 参考文献 】
  • [1]Kolaczkowski B, Thornton JW: Performance of maximum parsimony and likelihood phylogenetics when evolution is heterogeneous. Nature 2004, 431:980-984.
  • [2]Lanfear R, Calcott B, Ho SYW, Guindon S: Partitionfinder: combined selection of partitioning schemes and substitution models for phylogenetic analyses. Mol Biol Evol 2012, 29:1695-1701.
  • [3]Philippe H, Brinkmann H, Copley RR, Moroz LL, Nakano H, Poustka AJ, Wallberg A, Peterson KJ, Telford MJ: Acoelomorph flatworms are deuterostomes related to Xenoturbella. Nature 2011, 470:255-258.
  • [4]Li C, Lu G, Ortí G: Optimal data partitioning and a test case for ray-finned fishes (Actinopterygii) based on ten nuclear loci. Syst Biol 2008, 57:519-539.
  • [5]Telford MJ, Copley RR: Improving animal phylogenies with genomic data. Trends Genet 2011, 27:186-195.
  • [6]Brandley M, Schmitz A, Reeder T: Partitioned Bayesian analyses, partition choice, and the phylogenetic relationships of scincid lizards. Syst Biol 2005, 54:373-390.
  • [7]Shapiro B, Rambaut A, Drummond AJ: Choosing appropriate substitution models for the phylogenetic analysis of protein-coding sequences. Mol Biol Evol 2006, 23:7-9.
  • [8]McGuire JA, Witt CC, Altshuler DL, Remsen JV: Phylogenetic systematics and biogeography of hummingbirds: Bayesian and maximum likelihood analyses of partitioned data and selection of an appropriate partitioning strategy. Syst Biol 2007, 56:837-856.
  • [9]Blair C, Murphy RW: Recent trends in molecular phylogenetic analysis: where to next? J Hered 2011, 102:130-138.
  • [10]Ho SYW, Lanfear R: Improved characterisation of among-lineage rate variation in cetacean mitogenomes using codon-partitioned relaxed clocks. Mitochondrial DNA 2010, 21:138-146.
  • [11]Zhou Y, Rodrigue N, Lartillot N, Philippe H: Evaluation of the models handling heterotachy in phylogenetic inference. BMC Evol Biol 2007, 7:206. BioMed Central Full Text
  • [12]Leavitt JR, Hiatt KD, Whiting MF, Song H: Searching for the optimal data partitioning strategy in mitochondrial phylogenomics: A phylogeny of Acridoidea (Insecta: Orthoptera: Caelifera) as a case study. Mol Phylogenet Evol 2013, 67(2):494-508.
  • [13]Poux C, Madsen O, Glos J, de Jong WW, Vences M: Molecular phylogeny and divergence times of Malagasy tenrecs: influence of data partitioning and taxon sampling on dating analyses. BMC Evol Biol 2008, 8:102. BioMed Central Full Text
  • [14]Rota J, Wahlberg N: Exploration of data partitioning in an eight-gene data set: phylogeny of metalmark moths (Lepidoptera, Choreutidae). Zool Scr 2012, 41:536-546.
  • [15]Lartillot N: A Bayesian mixture model for across-site heterogeneities in the amino-acid replacement process. Mol Biol Evol 2004, 21:1095-1109.
  • [16]Wu C-H, Suchard MA, Drummond AJ: Bayesian selection of nucleotide substitution models and their site assignments. Mol Biol Evol 2013, 30:669-688.
  • [17]Le SQ, Lartillot N, Gascuel O: Phylogenetic mixture models for proteins. Philos Trans R Soc B Biol Sci 2008, 363:3965-3976.
  • [18]Lartillot N, Lepage T, Blanquart S: PhyloBayes 3: a Bayesian software package for phylogenetic reconstruction and molecular dating. Bioinformatics 2009, 25:2286-2288.
  • [19]Lartillot N, Rodrigue N, Stubbs D, Richer J: PhyloBayes MPI: phylogenetic reconstruction with infinite mixtures of profiles in a parallel environment. Syst Biol 2013, 62:611-615.
  • [20]Pagel M, Meade A: A phylogenetic mixture model for detecting pattern-heterogeneity in gene sequence or character-state data. Syst Biol 2004, 53:571-581.
  • [21]Stamatakis A: RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics 2006, 22:2688-2690.
  • [22]Ho JWK, Adams CE, Lew JB, Matthews TJ, Ng CC, Shahabi-Sirjani A, Tan LH, Zhao Y, Easteal S, Wilson SR, Jermiin LS: SeqVis: visualization of compositional heterogeneity in large alignments of nucleotides. Bioinformatics 2006, 22:2162-2163.
  • [23]Gayral P, Melo-Ferreira J, Glémin S, Bierne N, Carneiro M, Nabholz B, Lourenço JM, Alves PC, Ballenghien M, Faivre N, Belkhir K, Cahais V, Loire E, Bernard A, Galtier N: Reference-free population genomics from next-generation transcriptome data and the vertebrate-invertebrate gap. PLoS Genet 2013, 9:e1003457.
  • [24]Faircloth BC, McCormack JE, Crawford NG, Harvey MG, Brumfield RT, Glenn TC: Ultraconserved elements anchor thousands of genetic markers spanning multiple evolutionary timescales. Syst Biol 2012, 61:717-726.
  • [25]Lemmon AR, Emme SA, Lemmon EM: Anchored hybrid enrichment for massively high-throughput phylogenomics. Syst Biol 2012, 61:727-744.
  • [26]Prompiboon P, Lietze VU, Denton JSS, Geden CJ, Steenberg T, Boucias DG: Musca domestica salivary gland hypertrophy virus, a globally distributed insect virus that infects and sterilizes female houseflies. Appl Environ Microbiol 2010, 76:994-998.
  • [27]Whitehead A: The evolutionary radiation of diverse osmotolerant physiologies in killifish (Fundulus sp.). Evolution 2010, 64:2070-2085.
  • [28]Sanciangco MD, Carpenter L: A molecular phylogeny of the Grunts (Perciformes: Haemulidae) inferred using mitochondrial and nuclear genes. Zootaxa 2011, 2966:37-50.
  • [29]dos Reis M, Inoue J, Hasegawa M, Asher RJ, Donoghue PCJ, Yang Z: Phylogenomic datasets provide both precision and accuracy in estimating the timescale of placental mammal phylogeny. Proc R Soc B 2012, 279:3491-3500.
  • [30]Powell AFLA, Barker FK, Lanyon SM: Empirical evaluation of partitioning schemes for phylogenetic analyses of mitogenomic data: an avian case study. Mol Phylogenet Evol 2013, 66:69-79.
  • [31]Posada D, Crandall KA: Selecting the best-fit model of nucleotide substitution. Syst Biol 2001, 50:580-601.
  • [32]Sullivan J, Swofford DL, Naylor G: The effect of taxon sampling on estimating rate heterogeneity parameters of maximum-likelihood models. Mol Biol Evol 1999, 16:1347-1356.
  • [33]Yang Z: Computational Molecular Evolution. Oxford, England: Oxford University Press; 2006.
  • [34]Ward PS, Brady SG, Fisher BL, Schultz TR: Phylogeny and biogeography of dolichoderine ants: effects of data partitioning and relict taxa on historical inference. Syst Biol 2010, 59:342-362.
  • [35]Wainwright PC, Smith WL, Price SA, Tang KL, Sparks JS, Ferry LA, Kuhn KL, Eytan RI, Near TJ: The evolution of pharyngognathy: a phylogenetic and functional appraisal of the pharyngeal jaw key innovation in labroid fishes and beyond. Syst Biol 2012, 61:1001-1027.
  • [36]Wainwright PC, Smith WL, Price SA, Tang KL, Sparks JS, Ferry LA, Kuhn KL, Near TJ: Data from: The evolution of pharyngognathy: a phylogenetic and functional appraisal of the pharyngeal jaw key innovation in labroid fishes and beyond. Dryad Digital Repository 2012. doi:10.5061/dryad.5h951h04
  • [37]Pyron RA, Wiens JJ: A large-scale phylogeny of Amphibia including over 2800 species, and a revised classification of extant frogs, salamanders, and caecilians. Mol Phylogenet Evol 2011, 61:543-583.
  • [38]Pyron RA, Wiens JJ: Data from: A large-scale phylogeny of Amphibia including over 2800 species, and a revised classification of extant frogs, salamanders, and caecilians. Dryad Digital Repository 2011. doi:10.5061/dryad.vd0m7
  • [39]Kaffenberger N, Wollenberg KC, Köhler J, Glaw F, Vieites DR, Vences M: Molecular phylogeny and biogeography of Malagasy frogs of the genus Gephyromantis. Mol Phylogenet Evol 2012, 62:555-560.
  • [40]Kaffenberger N, Wollenberg KC, Köhler J, Glaw F, Vieites DR, Vences M: Data from: Molecular phylogeny and biogeography of Malagasy frogs of the genus Gephyromantis. Dryad Digital Repository 2011. doi:10.5061/dryad.s791pg03
  • [41]Irisarri I, Mauro DS, Abascal F, Ohler A, Vences M, Zardoya R: The origin of modern frogs (Neobatrachia) was accompanied by acceleration in mitochondrial and nuclear substitution rates. BMC Genomics 2012, 13:626. BioMed Central Full Text
  • [42]Irisarri I, Mauro DS, Abascal F, Ohler A, Vences M, Zardoya R: Data from: The origin of modern frogs (Neobatrachia) was accompanied by acceleration in mitochondrial and nuclear substitution rates. Dryad Digital Repository 2012. doi:10.5061/dryad.3qd54
  • [43]Hackett SJ, Kimball RT, Reddy S, Bowie RCK, Braun EL, Braun MJ, Chojnowski JL, Cox WA, Han K-L, Harshman J, Huddleston CJ, Marks BD, Miglia KJ, Moore WS, Sheldon FH, Steadman DW, Witt CC, Yuri T: A phylogenomic study of birds reveals their evolutionary history. Science 2008, 320:1763-1768.
  • [44]Fong JJ, Brown JM, Fujita MK, Boussau B: A phylogenomic approach to vertebrate phylogeny supports a turtle-archosaur affinity and a possible paraphyletic lissamphibia. PLoS One 2012, 7:e48990.
  • [45]Fong JJ, Brown JM, Fujita MK, Boussau B: Data from: A phylogenomic approach to vertebrate phylogeny supports a turtle-archosaur affinity and a possible paraphyletic Lissamphibia. Dryad Digital Repository 2012. doi:10.5061/dryad.25j6h
  • [46]Endicott P, Ho SYW: A Bayesian evaluation of human mitochondrial substitution rates. Am J Hum Genet 2008, 82:895-902.
  • [47]Burnham KP, Anderson DR: Multimodel Inference: understanding AIC and BIC in model selection. Sociol Methods Res 2004, 33:261-304.
  • [48]R Development Core Team: R: A language and environment for statistical computing. In R Foundation for Statistical Computing. Vienna, Austria; 2012.
  • [49]Meusemann K, Reumont von BM, Simon S, Roeding F, Strauss S, Kück P, Ebersberger I, Walzl M, Pass G, Breuers S, Achter V, Haeseler von A, Burmester T, Hadrys H, Wägele JW, Misof B: A phylogenomic approach to resolve the arthropod tree of life. Mol Biol Evol 2010, 27:2451-2464.
  • [50]McCormack JE, Maley JM, Hird SM, Derryberry EP, Graves GR, Brumfield RT: Next-generation sequencing reveals phylogeographic structure and a species tree for recent bird divergences. Mol Phylogenet Evol 2012, 62:397-406.
  • [51]Guindon S, Dufayard J-F, Lefort V, Anisimova M, Hordijk W, Gascuel O: New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst Biol 2010, 59:307-321.
  • [52]Lanfear R: Data From: Selecting optimal partitioning schemes for phylogenomic datasets. figShare 2014. http://dx.doi.org/10.6084/m9.figshare.938920 webcite
  文献评价指标  
  下载次数:30次 浏览次数:32次