Algorithms for Molecular Biology | |
Gene tree correction for reconciliation and species tree inference | |
Krister M Swenson1  Andrea Doroftei2  Nadia El-Mabrouk2  | |
[1] Departement of Computer Science, McGill University, 3480 University Street, Montréal, H3A 2A7, Québec, Canada | |
[2] Département d’Informatique et de Recherche Opérationnelle, Université de Montréal, CP 6128 succ Centre-Ville, Montréal, H3C 3J7, Québec, Canada | |
关键词: Maximum agreement subtree (MAST); Error correction; Reconciliation; Species tree; Gene tree; | |
Others : 794852 DOI : 10.1186/1748-7188-7-31 |
|
received in 2011-12-24, accepted in 2012-07-26, 发布年份 2012 | |
【 摘 要 】
Background
Reconciliation is the commonly used method for inferring the evolutionary scenario for a gene family. It consists in “embedding” inferred gene trees into a known species tree, revealing the evolution of the gene family by duplications and losses. When a species tree is not known, a natural algorithmic problem is to infer a species tree from a set of gene trees, such that the corresponding reconciliation minimizes the number of duplications and/or losses. The main drawback of reconciliation is that the inferred evolutionary scenario is strongly dependent on the considered gene trees, as few misplaced leaves may lead to a completely different history, with significantly more duplications and losses.
Results
In this paper, we take advantage of certain gene trees’ properties in order to preprocess them for reconciliation or species tree inference. We flag certain duplication vertices of a gene tree, the “non-apparent duplication” (NAD) vertices, as resulting from the misplacement of leaves. In the case of species tree inference, we develop a polynomial-time heuristic for removing the minimum number of species leading to a set of gene trees that exhibit no NAD vertices with respect to at least one species tree. In the case of reconciliation, we consider the optimization problem of removing the minimum number of leaves or species leading to a tree without any NAD vertex. We develop a polynomial-time algorithm that is exact for two special classes of gene trees, and show a good performance on simulated data sets in the general case.
【 授权许可】
2012 Swenson et al.; licensee BioMed Central Ltd.
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
20140705073722744.pdf | 385KB | download | |
Figure 5. | 26KB | Image | download |
Figure 4. | 20KB | Image | download |
Figure 3. | 29KB | Image | download |
Figure 2. | 11KB | Image | download |
Figure 1. | 17KB | Image | download |
【 图 表 】
Figure 1.
Figure 2.
Figure 3.
Figure 4.
Figure 5.
【 参考文献 】
- [1]Li WH, Gu Z, Wang H, Nekrutenko A: Evolutionary analysis of the human genome. Nature 2001, 409:847-849.
- [2]Durand D, Haldórsson BV, Vernot B: A hybrid micro-macro-evolutionary approach to gene tree reconstruction. J Comput Biol 2006, 13:320-335.
- [3]Zhang J: Evolution by gene duplication: an update. TRENDS Ecol Evol 2003, 18(6):292-298.
- [4]Goodman M, Czelusniak J, Moore GW, Romero-Herrera AE, Matsuda G: Fitting the gene lineage into its species lineage, a parsimony strategy illustrated by cladograms constructed from globin sequences. Syst Zool 1979, 28:132-163.
- [5]Doyon J-P, Scornavacca C, Gorbunov K, Szolloso G, Ranwez V, Berry V: An effi. algo. for gene/species trees parsim. reconc. with losses, dup. and transf. J Comp Biol 2010, 6398:93-108.
- [6]Hallett M, Lagergren J, Tofigh A: Simultaneous identification of duplications and lateral transfers. In Proceedings of the Eight Annual International Conference on Computational Molecular Biology, RECOMB. Edited by Bourne PE, Gusfield D. New York: ACM; 2004:347-356.
- [7]Tofigh A, Hallett M, Lagergren J: Simultaneous identification of duplications and lateral gene transfers. IEEE/ACM Trans Comput Biol Bioinf 2011, 8:517-535.
- [8]Chauve C, El-Mabrouk N: New perspectives on gene family evolution: losses in reconciliation and a link with supertrees. In Proceedings of the Thirteenth Annual International Conference on Computational Molecular Biology, RECOMB. Edited by Batzoglou S S. Springer; 2009:46-58. volume 5541 of LNCS
- [9]Chen K, Durand D, Farach-Colton M: Notung: Dating gene duplications using gene family trees. J Comput Biol 2000, 7:429-447.
- [10]Ma B, Li M, Zhang L: From gene trees to species trees. SIAM J Comput 2000, 30:729-752.
- [11]Guigó R, Muchnik I, Smith TF: Reconstruction of ancient molecular phylogeny. Mol Phylogenet Evol 1996, 6:189-213.
- [12]Page RDM, Charleston MA: Reconciled trees and incongruent gene and species trees. DIMACS Ser Discrete Mathematics and Theor Comput Sci 1997, 37:57-70.
- [13]Bonizzoni P, Vedova Della G, Dondi R: Reconciling a gene tree to a species tree under the duplication cost model. Theor Comput Sci 2005, 347:36-53.
- [14]Gorecki P, Tiuryn J: DLS-trees: a model of evolutionary scenarios. Theor Comput Sci 2006, 359:378-399.
- [15]Page RDM: Maps between trees and cladistic analysis of historical associations among genes, organisms, and areas. Syst Biol 1994, 43:58-77.
- [16]Page RDM: Genetree: comparing gene and species phylogenies using reconciled trees. Bioinformatics 1998, 14:819-820.
- [17]Hallett MT, Lagergren J: New algorithms for the duplication-loss model. In Proceedings of the Fourth Annual International Conference on Computational Molecular Biology, RECOMB. Edited by Shamir R, Miyano S, Istrail S, Pevzner P, Waterman MS. New York: ACM; 2000:138-146.
- [18]Hahn MW: Bias in phylogenetic tree reconciliation methods: implications for vertebrate genome evolution. Genome Biol 2007, 8:R141. BioMed Central Full Text
- [19]Chang WC, Eulenstein O: Reconciling gene trees with apparent polytomies. In Proceedings of the 12th Conference on Computing and Combinatorics (COCOON), volume 4112 of Lecture Notes in Computer Science. Edited by Chen DZ, Leepages DT. Taipei, Taiwan; 2006:235-244.
- [20]Lafond M, Swenson KM, El-Mabrouk N: An optimal reconciliation algorithm for gene trees with polytomies. WABI, volume 7534 of LNBI/LNBI 2012, 106-122.
- [21]Dondi R, El-Mabrouk N: Minimum leaf removal for reconciliation: complexity and algorithms. Combinatorial Pattern Matching (CPM) 2012. accepted
- [22]Doroftei A, El-Mabrouk N: Removing noise from gene trees. WABI, volume 6833 of LNBI/LNBI 2011, 76-91.
- [23]Chauve C, Doyon J-P, El-Mabrouk N: Gene family evolution by duplication, speciation and loss. J Comput Biol 2008, 15:1043-1062.
- [24]Hahn MW, Han MV, Han S-G: Gene family evolution across 12 drosophilia genomes. PLoS Genet 2007, 3:e197.
- [25]Page R D M: Modified mincut supertrees. LNCS, volume 2452 of WABI 2002, 537-551.
- [26]Semple C, Steel M: A supertree method for rooted trees. Discrete Appl Math 2000, 105:147-158.
- [27]Snir S, Rao S: Using max cut to enhance rooted trees consistency. IEEE/ACM Trans Comput Biol Bioinf 2006, 3:323-333.
- [28]Berge C: Hypergraphs:Combinatorics of Finite Sets, volume 45. Amsterdam, North-Holland; 1989.
- [29]Hao J, Orlin JB: A faster algorithm for finding the minimum cut in a graph. Proceedings of the Third Annual ACM-SIAM Symposium on Discrete Algorithms 1992, 165-174.
- [30]Cole R, Farach-Colton M, Hariharan R, Przytycka T, Thorup M: An o(n\log n) algorithm for the maximum agreement subtree problem for binary trees. SIAM J Comput 2000, 30(5):1385-1404.
- [31]Finden CR, Gordon AD: Obtaining common pruned trees. J Classif 1995, 2:255-276.
- [32]Amir A, Keselman D: Maximum agreement subtree in a set of evolutionary trees: matrics and efficient algorithms. SIAM J Comput 1997, 26:1656-1669.
- [33]Steel M, Warnow T: Kaikoura tree theorems:computing the maximum agreement subtree. Inform Process Lett 1993, 48:77-82.
- [34]Bryant D: Building trees, hunting for trees and comparing trees: Theory and method in phylogenetic analysis. PhD dissertation, Department of Mathematics, University of Canterbury, UK; 1997
- [35]Farach M, Przytycka TM, Thorup M: On the agreement of many trees. Inf Process Lett 1995, 55(6):297-301.
- [36]Zhang LX: On Mirkin-Muchnik-Smith conjecture for comparing molecular phylogenies. J Comput Biol 1997, 4:177-188.