期刊论文详细信息
Algorithms for Molecular Biology
A priori assessment of data quality in molecular phylogenetics
Peter F Stadler2  Sonja J Prohaska3  Patrick Kück5  Björn M von Reumont1  Karen Meusemann4  Bernhard Misof5 
[1]Dept. of Life Sciences, The Natural History Museum London, Cromwell Road, GB-SW7 5BD Acton, London, UK
[2]Institute for Theoretical Chemistry, University of Vienna, Währingerstraße 17, A-1090 Vienna, Austria
[3]Interdisciplinary Center for Bioinformatics, University Leipzig, Härtelstraße 16-18, D-04107 Leipzig, Germany
[4]CSIRO Ecosystem Sciences, Australian National Insect Collection, Clunies Ross Street, AU-2601 Acton, Canberra, Australia
[5]Zoologisches Forschungsmuseum Alexander Koenig, Adenauerallee 160, D-53113 Bonn, Germany
关键词: Biases;    Quartets;    Multiple sequence alignments;    Phylogenetic networks;    Tree-likeness;    Phylogenomics;   
Others  :  1082114
DOI  :  10.1186/s13015-014-0022-4
 received in 2014-05-21, accepted in 2014-07-24,  发布年份 2014
PDF
【 摘 要 】

Sets of sequence data used in phylogenetic analysis are often plagued by both random noise and systematic biases. Since the commonly used methods of phylogenetic reconstruction are designed to produce trees it is an important task to evaluate these trees a posteriori. Preferably, however, one would like to assess the suitability of the input data for phylogenetic analysis a priori and, if possible, obtain information on how to prune the data sets to improve the quality of phylogenetic reconstruction without introducing unwarranted biases. In the last few years several different approaches, algorithms, and software tools have been proposed for this purpose. Here we provide an overview of the state of the art and briefly discuss the most pressing open problems.

【 授权许可】

   
2014 Misof et al.; licensee BioMed Central Ltd.

【 预 览 】
附件列表
Files Size Format View
20141204123138410.pdf 1177KB PDF download
Figure 4. 53KB Image download
Figure 3. 11KB Image download
Figure 2. 55KB Image download
Figure 1. 18KB Image download
【 图 表 】

Figure 1.

Figure 2.

Figure 3.

Figure 4.

【 参考文献 】
  • [1]Just W: Computational complexity of multiple sequence alignment with SP-score. J Comput Biol 2001, 8:615-623.
  • [2]Wang L, Jiang T: On the complexity of multiple sequence alignment. J Comput Biol 1994, 1:337-348.
  • [3]Lunter G, Miklȯs I, Drummond A, Jensen JL, Hein J: Bayesian coestimation of phylogeny and sequence alignment. BMC Bioinformatics 2005, 6:83. BioMed Central Full Text
  • [4]Redelings BD, Suchard MA: Joint bayesian estimation of alignment and phylogeny. Syst Biol 2005, 54:401-418.
  • [5]Farris JS: The retention index and the rescaled consistency index. Cladistics 1989, 5:417-419.
  • [6]Felsenstein J: Cases in which parsimony or compatibility methods will be positively misleading. Syst Zool 1978, 27:401-410.
  • [7]Telford MJ, Copley RR: Animal phylogeny: fatal attraction. Curr Biol 2005, 15:296-299.
  • [8]Simões-Pereira JMS: A note on the tree realizability of a distance matrix. J Combin Theory 1969, 6:303-310.
  • [9]Buneman P: A note on the metric property of trees. J Combin Theory Ser B 1974, 17:48-50.
  • [10]Chambers KE, McDaniell R, Raincrow JD, Deshmukh M, Stadler PF, Chiu C-h: Hox cluster duplication in the basal teleost Hiodon alosoides (Osteoglossomorpha). Theory Biosci 2009, 128:109-120.
  • [11]Eigen M, Winkler-Oswatitsch R, Dress AWM: Statistical geometry in sequence space: a method of quantitative comparative sequence analysis. Proc Natl Acad Sci USA 1988, 85:5913-5917.
  • [12]Nieselt-Struwe K: Graphs in sequence spaces: a review of statistical geometry. Biophys Chem 1997, 30:111-131.
  • [13]Holland BR, Huber KT, Dress AWM, Moulton V: δ plots: A tool for analyzing phylogenetic distance data. Mol Biol Evol 2002, 19:2051-2059.
  • [14]Huson D, Steel M: Distances that perfectly mislead. Syst Biol 2004, 53:327-332.
  • [15]Nieselt-Struwe K, von Haeseler A: Quartet-mapping, a generalization of the Likelihood-Mapping procedure. Mol Biol Evol 2001, 18:1204-1219.
  • [16]Strimmer K, von Haeseler A: Likelihood-mapping: a simple method to visualize phylogenetic content of a sequence alignment. Proc Natl Acad Sci USA 1997, 94:6815-6819.
  • [17]Stadler PF, Fried C, Prohaska SJ, Bailey WJ, Misof BY, Ruddle FH, Wagner GP: Evidence for independent Hox gene duplications in the hagfish lineage: A PCR-based gene inventory ofEptatretus stoutii. Mol Phylog Evol 2004, 32:686-692.
  • [18]Raincrow JD, Dewar K, Stocsits C, Prohaska SJ, Amemiya CT, Stadler PF, Chiu C-h: Hox clusters of the bichir (Actinopterygii,Polypterus senegalus), highlight unique patterns of sequence evolution in gnathostome phylogeny. J Exp Zool 2011, 316:451-464.
  • [19]Zhaxybayeva O, Hamel L, Raymond J, Gogarten JP: Visualization of the phylogenetic content of five genomes using dekapentagonal maps. Genome Biol 2004, 5:20. BioMed Central Full Text
  • [20]Hamel L, Zhaxybayeva O, Gogarten JP: PentaPlotPentaPlot: A software tool for the illustration of genome mosaicism. BMC Bioinformatics 2005, 6:139. BioMed Central Full Text
  • [21]Hendy M, Penny D: A framework for the quantitative study of evolutionary trees. Syst Zool 1989, 38:297-309.
  • [22]Bryant D: Hadamard phylogenetic methods and then-taxon process. Bull Math Biol 2009, 71:339-351.
  • [23]Lento GM, Hickson RE, Chambers GK, Penny D: Use of spectral analysis to test hypotheses on the origin of pinnipeds. J Mol Biol Evol 1995, 12:28-52.
  • [24]Huber KT, Langton M, Penny V, Moulton D, Hendy M: Spectronet: a package for computing spectra and median networks. Appl Bioinform 2002, 1:2041-2059.
  • [25]White T, Hills SF, Gaddam R, Holland BR, Penny D: Treeness triangles: Visualizing the loss of phylogenetic signal. Mol Biol Evol 2007, 24:2029-2039.
  • [26]Ogden TH, Rosenberg M: Multiple sequence alignment accuracy and phylogenetic inference. Syst Biol 2006, 55:314-328.
  • [27]Landan G, Graur D: Heads or tails: a simple reliability check for multiple sequence alignments. Mol Biol Evol 2007, 24:1380-1383.
  • [28]Yang Z: On the best evolutionary rate for phylogenetic analysis. Syst Biol 1998, 47:125-133.
  • [29]Wägele J-W: Foundations of Phylogenetic Systematics. Verlag Dr Friedrich Pfeil, Munich, Germany; 2005.
  • [30]Kück P, Mayer C, Wägele J-W, Misof B: Long branch effects distort maximum likelihood phylogenies in simulations despite selection of the correct model. PLoS ONE 2012, 7:36593.
  • [31]Björklund M: Are third positions really that bad? a test using vertebrate cytochrome b. Cladistics 1999, 15:91-97.
  • [32]Castresana J: Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. Mol Biol Evol 2000, 17:540-552.
  • [33]Talavera G, Castresana J: Improvement of phylogenies after removing divergent and ambiguously aligned blocks from protein sequence alignments. Syst Biol 2007, 56:564-577.
  • [34]Hartmann S, Vision TJ: Using ESTs for phylogenomics: Can one accurately infer a phylogenetic tree from a gappy alignment. BMC Evol Biol 2008, 8:95. BioMed Central Full Text
  • [35]Roure B, Baurain D, Philippe H: Impact of missing data on phylogenies inferred from empirical phylogenomic data sets. Mol Biol Evol 2013, 30:197-214.
  • [36]Dress AWM, Flamm C, Fritzsch G, Grünewald S, Kruspe M, Prohaska SJ, Stadler PF: Identification of homoplastic characters in multiple sequence alignments. Alg Mol Biol 2008, 3:7. BioMed Central Full Text
  • [37]Bandelt HJ, Dress AWM: A canonical decomposition theory for metrics on a finite set. Adv Math 1992, 92:47-105.
  • [38]Huson DH: SplitsTreeSplitsTree: analyzing and visualizing evolutionary data. Bioinformatics 1998, 14:68-73.
  • [39]Semple C, Steel M: Cyclic permutations and evolutionary trees. Adv Appl Math 2004, 32:669-680.
  • [40]Bryant D, Moulton V: Neighbor-net: An agglomerative method for the construction of phylogenetic networks. Mol Biol Evol 2004, 21:255-265.
  • [41]Grünewald S, Forslund K, Dress AWM, Moulton V: QNet: an agglomerative method for the construction of phylogenetic networks from weighted quartets. Mol Biol Evol 2007, 24:532-538.
  • [42][http://www.bioinf.uni-leipzig.de/Software/noisy/] webcite Dress AWM, Flamm C, Fritzsch G, Grünewald S, Kruspe M, Prohaska SJ, Stadler PF: noisySoftware2011. []
  • [43]Misof B, Misof K: A Monte Carlo approach successfully identifies randomness of multiple sequence alignments: A more objective approach of data exclusion. Syst Biol 2009, 58:21-34.
  • [44]Kück P, Meusemann K, Raupach M, von Reumont B, Wägele W, Misof B: Parametric and non-parametric masking of randomness in sequence alignments can be improved and leads to better resolved trees. Frontiers Zool 2010, 7:10. BioMed Central Full Text
  • [45]von Reumont BM, Meusemann K, Szucsich NU, Dell’Ampio E, Bartel D, Simon S, Letsch HO, Stocsits RR, Luan Y, Wägele JW, Pass G, Hadrys H, Misof B: Can comprehensive background knowledge be incorporated into substitution models to improve phylogenetic analyses? a case study on major arthropod relationships. BMC Evol Biol 2009, 9:119. BioMed Central Full Text
  • [46]Wägele J-W, Letsch H, Klussmann-Kolb A, Mayer C, Misof B, Wägele H: Phylogenetic support values are not necessarily informative: the case of the Serialia hypothesis (a mollusk phylogeny). Frontiers Zool 2009, 6:12. BioMed Central Full Text
  • [47]Schwarzer J, Misof B, Tautz D, Schliewen UK: The root of the East African cichlid radiations. BMC Evol Biol 2009, 9:186. BioMed Central Full Text
  • [48]Letsch HO, Kück P, Schmidt C, Fleck G, Stocsits RR, Misof B: The impact of rRNA secondary structure consideration in alignment and tree reconstruction: simulated data and a case study on the phylogeny of hexapods. Mol Biol Evol 2010, 27:2507-2521.
  • [49]Murienne J, Edgecombe GD, Giribet G: Including secondary structure, fossils and molecular dating in the centipede tree of life. Mol Phylog Evol 2010, 57:301-313.
  • [50]Meusemann K, von Reumont BM, Simon S, Roeding F, Kueck P, Ebersberger I, Strauss S, Walzl M, Pass G, Breuers S, Achter V, Wägele J-W, Hadrys H, Burmester T, von Haeseler A, Misof B: A phylogenomic approach to resolve the arthropod tree of life. Mol Biol Evol 2010, 27:2451-2464.
  • [51]Sanderson MJ, Driskell AC: The challenge of constructing large phylogenetic trees. Trends Plant Sci 2003, 8:374-379.
  • [52]Driskell AC, Ané C, Burleigh JG, McMahon MM, O’Meara BC, Sanderson MJ: Prospects for building the tree of life from large sequence databases. Science 2004, 306:1172-1174.
  • [53]Wiens JJ: Missing data, incomplete taxa, and phylogenetic accuracy. Syst Biol 2003, 52:528-538.
  • [54]Wiens JJ: Missing data and the design of phylogenetic analyses. J Biomed Inform 2006, 39:34-42.
  • [55][http://dimacs.rutgers.edu/TechnicalReports/2002.html] webcite Alexe G, Alexe S, Crama Y, Foldes S, Hammer PL, Simeone B: Consensus algorithms for the generation of all maximal bicliques. DIMACS Technical Reports 2002-52, Rutgers University, Piscataway, NJ, USA, 2002. []
  • [56]Sanderson MJ, Driskell AC, Ree RH, Eulenstein O, Langley S: Obtaining maximal concatenated phylogenetic data sets from large sequence databases. Mol Biol Evol 2003, 20:1036-1042.
  • [57]Yan C, Burleigh JG, Eulenstein O: Identifying optimal incomplete phylogenetic data sets from sequence databases. Mol Phylogenet Evol 2005, 30:528-535.
  • [58]Liu X, Li J, Wang L: Modeling protein interacting groups by quasi-bicliques: complexity, algorithm, and application. IEEE/ACM Trans Comput Biol Bioinform 2010, 7:354-364.
  • [59]Yannakakis M: Node deletion problems on bipartite graphs. SIAM J Comput 1981, 10:310-327.
  • [60]Peeters R: The maximum edge biclique problem is NP-complete. Discrete Appl Math 2003, 131:651-654.
  • [61]Chang W-C, Vakati S, Krause R, Eulenstein O: Exploring biological interaction networks with tailored weighted quasi-bicliques. BMC Bioinformatics 2012 2012, 13(S10):16. BioMed Central Full Text
  • [62]Misof B, Meyer B, von Reumont BM, Kück P, Misof K, Meusemann K: Selecting informative subsets of sparse supermatrices increases the chance to find correct trees. BMC Bioinformatics 2013, 14:348. BioMed Central Full Text
  • [63]Gribaldo S, Philippe H: Ancient phylogenetic relationships. Theor Popul Biol 2002, 61:391-408.
  • [64]Wake DB, Wake MH, Specht CD: Homoplasy: from detecting pattern to determining process and mechanism of evolution. Science 2011, 331:1032-1035.
  文献评价指标  
  下载次数:20次 浏览次数:15次