BMC Bioinformatics | |
Local search for the generalized tree alignment problem | |
Andrés Varón1  Ward C Wheeler1  | |
[1] Division of Invertebrate Zoology, American Museum of Natural History, New York, NY - 10024, USA | |
关键词: Direct optimization; Sequence alignment; Phylogeny; Tree search; Tree alignment; | |
Others : 1087968 DOI : 10.1186/1471-2105-14-66 |
|
received in 2012-05-17, accepted in 2013-01-31, 发布年份 2013 | |
【 摘 要 】
Background
A phylogeny postulates shared ancestry relationships among organisms in the form of a binary tree. Phylogenies attempt to answer an important question posed in biology: what are the ancestor-descendent relationships between organisms? At the core of every biological problem lies a phylogenetic component. The patterns that can be observed in nature are the product of complex interactions, constrained by the template that our ancestors provide. The problem of simultaneous tree and alignment estimation under Maximum Parsimony is known in combinatorial optimization as the Generalized Tree Alignment Problem (GTAP). The GTAP is the Steiner Tree Problem for the sequence edit distance. Like many biologically interesting problems, the GTAP is NP-Hard. Typically the Steiner Tree is presented under the Manhattan or the Hamming distances.
Results
Experimentally, the accuracy of the GTAP has been subjected to evaluation. Results show that phylogenies selected using the GTAP from unaligned sequences are competitive with the best methods and algorithms available. Here, we implement and explore experimentally existing and new local search heuristics for the GTAP using simulated and real data.
Conclusions
The methods presented here improve by more than three orders of magnitude in execution time the best local search heuristics existing to date when applied to real data.
【 授权许可】
2013 Varón and Wheeler; licensee BioMed Central Ltd.
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
20150117062152978.pdf | 570KB | download | |
Figure 8. | 46KB | Image | download |
Figure 7. | 29KB | Image | download |
Figure 6. | 33KB | Image | download |
Figure 5. | 53KB | Image | download |
Figure 4. | 24KB | Image | download |
Figure 3. | 10KB | Image | download |
Figure 2. | 8KB | Image | download |
Figure 1. | 9KB | Image | download |
【 图 表 】
Figure 1.
Figure 2.
Figure 3.
Figure 4.
Figure 5.
Figure 6.
Figure 7.
Figure 8.
【 参考文献 】
- [1]Farris JS: The Logical Basis of Phylogenetic Analysis.. New York, NY: Columbia University Press; 1983. 7–36
- [2]Foulds LR, Graham RL: The Steiner problem in phylogeny is NP-complete. Adv Appl Math 1982, 3:43-49.
- [3]Sankoff D: Minimal mutation trees of sequences. SIAM J Appl Math 1975, 28:35-42.
- [4]Ogden TH, Rosenberg MS: Alignment and topological accuracy of the direct optimization approach via POY and Traditional Phylogenetics via ClustalW + PAUP∗. Syst Biol 2007, 56(2):182-193.
- [5]Lehtonen S: Phylogeny Estimation and Alignment via POY versus Clustal + PAUP∗: A response to Ogden and Rosenberg (2007). Syst Biol 2008, 57(4):653-657.
- [6]Liu K, Nelesen S, Raghavan S, Linder CR, Warnow T: Barking up the wrong treelength: the impact of gap penalty on alignment and tree accuracy. IEEE Trans Comput Biol Bioinf 2008, 6:7-20.
- [7]Yue F, Shi J, Tang J: Simultaneous phylogeny reconstruction and multiple sequence alignment. BMC Bioinf 2009, 10(Suppl 1):S11. BioMed Central Full Text
- [8]Wheeler WC, Aagesen L, Arango CP, Faivovich J, Grant T, D’Haese C, Janies D, Smith WL, Varón A, Giribet G: Dynamic Homology and Phylogenetic Systematics: A Unified Approach using POY.. New York, NY: American Museum of Natural History; 2006.
- [9]Varón A, Vinh LS, Wheeler WC: POY version 4: Phylogenetic analysis using dynamic homologies. Cladistics 2010, 26:72-85.
- [10]Varón A, Wheeler WC: The tree-alignment problem. BMC Bioinf 2012, 13:293. BioMed Central Full Text
- [11]Semple C, Steel M: Phylogenetics. Great Britain: Oxford University Press; 2003.
- [12]Farris JS: Methods for computing wagner trees. Syst Zool 1970, 19:86-92.
- [13]Swofford DL: PAUP: Phylogenetic Analysis using Parsimony, V3.1.1. Washington: Smithsonian Institution; 1993.
- [14]Zachariasen M: Rectilinear full Steiner tree generation. Networks 1999, 33:125-143.
- [15]Winter P, Zachariasen M: Euclidean Steiner minimum trees: an improved exact algorithm. Networks 1997, 30:149-166.
- [16]Goloboff PA: Tree searches under Sankoff parsimony. Cladistics 1998, 14:229-237.
- [17]Goloboff PA: Character optimization and calculation of tree lenghts. Cladistics 1993, 9(4):433-436.
- [18]Goloboff PA: Analyzing large data sets in reasonable times: solutions for comosite optima. Cladistics 1999, 15(4):415-428.
- [19]Wheeler WC: Optimization alignment: the end of multiple sequence alignment in phylogenetics? Cladistics 1996, 12:1-9.
- [20]Wheeler WC: Search-based optimization. Cladistics 2003, 19(4):348-355.
- [21]Gladstein DS: Efficient incremental character optimization. Cladistics 1997, 13:21-26.
- [22]Hein J: A new method that simultaneously aligns and reconstructs ancestral sequences for any number of homologous sequences, when the phylogeny is given. Mol Biol Evol 1989, 6(6):649-668.
- [23]Sankoff D, Cedergren RJ: Simultaneous Comparison of Three or more Sequences Related by a Tree.. Reading MA: Addison-Wesley; 1983. 253–263
- [24]Wheeler WC: Iterative pass optimization of sequence data. Cladistics 2003, 19:254-260.
- [25]Kirkpatrick S, Gelatt CD, Vecchi MP: Optimization by simulated annealing. Science 1983, 220(4598):671-680.
- [26]Barker D: LVB: parsimony and simulated annealing in the search for phylogenetic trees. Bioinformatics 2004, 20:274-275.
- [27]Zola J, Tryastram D, Tchernykh A, Brizuela C: Parallel multiple sequence alignment with local phylogeny search by simulated annealing. In IPDPS, 20th International Parallel and Distributed Processing Symposium. : IEEE; 2006.
- [28]Platnick N: An empirical comparison of parsimony programs. Cladistics 1987, 3:121-144.
- [29]Swofford DL, Olsen GJ, Waddell PJ, Hillis DM: Phylogeny reconstruction. In Molecular Systematics. Edited by Hillis DM, Moritz C, Mable BK. Sunderland, Massachusetts: Sinauer Associates; 1996:407-514.
- [30]Sankoff D, Leduc G, Antoine N, Paquin B, Lang BF, Cedergren R: Gene order comparisons for phylogenetic inference: evolution of the mitochondrial genome. Proc Natl Acad Sci USA 1992, 89:6575-6579.
- [31]Kececioglu J, Sankoff D: Efficient bounds for oriented chromosome inversion distance. In Proceedings of the 5th Annual Symposium on Combinatorial Pattern Matching, Volume 807 of Lecture Notes in Computer Science. New York, NY: Springer Verlag; 1994:307-325.
- [32]Yancopoulos S, Attie O, Friedberg R: Efficient sorting of genomic permutations by translocation, inversion and block interchange. Bioinformatics 2005, 21:3340-3346.
- [33]Okasaki C: Purely Functional Data Structures. Cambridge: Cambridge University Press; 1999.
- [34]Schwikowski B, Vingron M: Weighted sequence graphs: boosting iterated dynamic programming using locally suboptimal solutions. Discrete Appl Math 2003, 127:95-117.
- [35]Needleman SB, Wunsch CD: A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 1970, 48:443-453.
- [36]Cartwright RA: DNA Assembly with gaps (Dawg): simulating sequence evolution. Bioinformatics 2005, 21(Suppl 3):iii31-iii38.
- [37]Wheeler WC, Gladstein D, De Laet J: POY, Phylogeny Reconstruction via Optimization of DNA and other Data version 3.0.11 (May 6 of 2003).. New York, NY: American Museum of Natural History; 2003. [http://research.amnh.org/scicomp/projects/poy.php webcite]
- [38]Faivovich J, Haddad CFB, Garcia PCA, Frost DR, Campbell JA, Wheeler WC: Systematic review of the frog family Hylidae, with special reference to Hylinae: phylogenetic analysis and taxonomic revision. Bull Am Museum Nat Hist 2005, 294:1-240.