期刊论文详细信息
Biology Direct
Cubic time algorithms of amalgamating gene trees and building evolutionary scenarios
Konstantin Yu Gorbunov2  Leonid Y Rusin1  Lev I Rubanov2  Vassily A Lyubetsky2 
[1]Faculty of Biology, Moscow State University, Vorob’evy Gory 1/12, Moscow, 119991, Russia
[2]Institute for Information Transmission Problems, The Russian Academy of Sciences (Kharkevich Institute), Bolshoy Karetny per. 19, Moscow, 127994, Russia
关键词: Time slices;    Gene gain;    Horizontal gene transfer;    Gene loss;    Gene duplication;    Evolutionary events;    Supertree;    Tree reconciliation;    Tree amalgamation;    Species tree;    Tree inference;    Fast algorithms;    Phylogenetics;   
Others  :  795171
DOI  :  10.1186/1745-6150-7-48
 received in 2012-09-07, accepted in 2012-12-11,  发布年份 2012
PDF
【 摘 要 】

Background

A long recognized problem is the inference of the supertree S that amalgamates a given set {Gj} of trees Gj, with leaves in each Gj being assigned homologous elements.

We ground on an approach to find the tree S by minimizing the total cost of mappings αj of individual gene trees Gj into S. Traditionally, this cost is defined basically as a sum of duplications and gaps in each αj. The classical problem is to minimize the total cost, where S runs over the set of all trees that contain an exhaustive non-redundant set of species from all input Gj.

Results

We suggest a reformulation of the classical NP-hard problem of building a supertree in terms of the global minimization of the same cost functional but only over species trees S that consist of clades belonging to a fixed set P (e.g., an exhaustive set of clades in all Gj). We developed a deterministic solving algorithm with a low degree polynomial (typically cubic) time complexity with respect to the size of input data.

We define an extensive set of elementary evolutionary events and suggest an original definition of mapping β of tree G into tree S. We introduce the cost functional c(G, S, f ) and define the mapping β as the global minimum of this functional with respect to the variable f, in which sense it is a generalization of classical mapping α.

We suggest a reformulation of the classical NP-hard mapping (reconciliation) problem by introducing time slices into the species tree S and present a cubic time solving algorithm to compute the mapping β. We introduce two novel definitions of the evolutionary scenario based on mapping β or a random process of gene evolution along a species tree.

Conclusions

Developed algorithms are mathematically proved, which justifies the following statements. The supertree building algorithm finds exactly the global minimum of the total cost if only gene duplications and losses are allowed and the given sets of gene trees satisfies a certain condition. The mapping algorithm finds exactly the minimal mapping β, the minimal total cost and the evolutionary scenario as a minimum over all possible distributions of elementary evolutionary events along the edges of tree S.

The algorithms and their effective software implementations provide useful tools in many biological studies. They facilitate processing of voluminous tree data in acceptable time still largely avoiding heuristics. Performance of the tools is tested with artificial and prokaryotic tree data.

Reviewers

This article was reviewed by Prof. Anthony Almudevar, Prof. Alexander Bolshoy (nominated by Prof. Peter Olofsson), and Prof. Marek Kimmel.

【 授权许可】

   
2012 Lyubetsky et al.; licensee BioMed Central Ltd.

【 预 览 】
附件列表
Files Size Format View
20140705082515791.pdf 1062KB PDF download
Figure 8. 97KB Image download
Figure 7. 22KB Image download
Figure 6. 71KB Image download
Figure 5. 21KB Image download
Figure 4. 15KB Image download
Figure 3. 21KB Image download
Figure 2. 18KB Image download
Figure 1. 19KB Image download
【 图 表 】

Figure 1.

Figure 2.

Figure 3.

Figure 4.

Figure 5.

Figure 6.

Figure 7.

Figure 8.

【 参考文献 】
  • [1]Page RDM: Maps between trees and cladistic analysis of historical associations among genes, organisms and areas. Syst Biol 1994, 43:58-77.
  • [2]Guigo R, Muchnik I, Smith TF: Reconstruction of ancient molecular phylogeny. Mol Phylogenet Evol 1996, 6(2):189-213.
  • [3]Gorbunov KY, Lyubetsky VA: Reconstructing the evolution of genes along the species tree. Mol Biol (Mosk) 2009, 43(5):881-893.
  • [4]Bininda-Emonds Olaf RP: Phylogenetic supertrees. Combining information to reveal the tree of life. Dordrecht/Boston/London: Kluwer Academic Publishers; 2004.
  • [5]Bansal MS, Burleigh JG, Eulenstein O, Fernandez-Baca D: Robinson-foulds supertrees. Algorithms Mol Biol 2010, 5:18. BioMed Central Full Text
  • [6]Nguyen N, Mirarab S, Warnow T: MRL and SuperFine+MRL: new supertree methods. Algorithms Mol Biol 2012, 7:3. BioMed Central Full Text
  • [7]Gorbunov KY, Lyubetsky VA: The tree nearest on average to a given set of trees. Probl Inf Transm 2011, 47(3):274-288.
  • [8]Gorbunov KY, Lyubetsky VA: Fast algorithm to reconstruct a species supertree from a set of protein trees. Mol Biol (Mosk) 2012, 46(1):161-167.
  • [9]Gorbunov KY, Lyubetsky VA: Identification of ancestral genes that introduce incongruence between protein- and species trees. Mol Biol (Mosk) 2005, 39(5):741-751.
  • [10]Koonin EV: Orthologs, paralogs, and evolutionary genomics. Annu Rev Genet 2005, 39:309-338.
  • [11]Mi H, Dong Q, Muruganujan A, Gaudet P, Levis S, Thomas PD: Panther version 7: improved phylogenetic trees, orthologs and collaboration with the gene ontology consortium. Nucleic Acids Res 2010, 38(Suppl. 1):D204-D210.
  • [12]Sennblad B, Lagergren J: Probabilistic ortology analysis. Syst Biol 2009, 58:411-424.
  • [13]Gorbunov KY, Lyubetsky VA: An algorithm of reconciliation of gene and species trees and inferring gene duplications, losses and horizontal transfers. Information Processes 2010, 10(2):140-144. in Russian
  • [14]Tofigh A: Using trees to capture reticulate evolution, lateral gene transfers and cancer progression. Sweden: KTH Royal Institute of Technology; 2009. [PhD thesis]
  • [15]Gorbunov KY, Kanovei VG, Lyubetsky VA: Inferring optimal scenario of gene evolution along a species tree. Russia: Novosibirsk; 2008:90. [Abstracts of The sixth international conference on bioinformatics of genome regulation and structure (BGRS’2008): 22–28 June 2008]
  • [16]Libeskind-Hadas R, Charleston M: On the computational complexity of the reticulate cophylogeny reconstruction problem. J Comput Biol 2009, 16(1):105-117.
  • [17]Merkle D, Middendorf M: Reconstruction of the cophylogenetic history of related phylogenetic trees with divergence timing information. Theory Biosci 2005, 123(4):277-279.
  • [18]Doyon J-P, Scornavacca C, Gorbunov KY, Szeollosi GJ, Ranwez V, Berry V: An efficient algorithm for gene/species trees parsimonious reconciliation with losses, duplications and transfers. Lecture Notes in Bioinformatics 2010, 6398:93-108.
  • [19]A program for supertree construction. http://lab6.iitp.ru/en/super3gl/ webcite
  • [20]Program for phylogenetic study of joint evolution of species and genes. http://lab6.iitp.ru/en/embed3gl/ webcite
  • [21]Joint supercomputer center of RAS. http://www.jscc.ru/eng/index.shtml webcite
  • [22]Creevey CJ, McInerney JO: CLANN: Investigating phylogenetic information through supertree analysis. Bioinformatics 2005, 21(3):390-392.
  • [23]Robinson DR, Foulds LR: Comparison of phylogenetic trees. Math Biosci 1981, 53:131-147.
  • [24]Pisani D, Cotton JA, McInerney JO: Supertrees disentangle the chimerical origin of eukaryotic genomes. Mol Biol Evol 2007, 24(8):1752-1760.
  • [25]Wu D, Hugenholtz P, Mavromatis K, Pukall R, Dalin E, Ivanova NN, Kunin V, Goodwin L, Wu M, Tindall BJ, Hooper SD, Pati A, Lykidis A, Spring S, Anderson IJ, D’haeseleer P, Zemla A, Singer M, Lapidus A, Nolan M, Copeland A, Han C, Chen F, Cheng J-F, Lucas S, Kerfeld C, Lang E, Gronow S, Chain P, Bruce D, Rubin EM, Kyrpides NC, Klenk H-P, Eisen JA: A phylogeny-driven genomic encyclopaedia of Bacteria and Archaea. Nature 2009, 462(7276):1056-1060.
  • [26]The NCBI taxonomy homepage. http://www.ncbi.nlm.nih.gov/Taxonomy/taxonomyhome.html webcite
  • [27]Page RDM: TREEVIEW: an application to display phylogenetic trees on personal computers. Comput Appl Biosci 1996, 12:357-358.
  • [28]Huson DH, Richter DC, Rausch C, Dezulian T, Franz M, Rupp R: Dendroscope: an interactive viewer for large phylogenetic trees. BMC Bioinforma 2007, 8(1):460. BioMed Central Full Text
  文献评价指标  
  下载次数:140次 浏览次数:44次