Algorithms for Molecular Biology | |
The difficulty of protein structure alignment under the RMSD | |
Shuai Cheng Li1  | |
[1] Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong | |
关键词: LCP; RMSD; Alignment; Protein Structure; | |
Others : 794797 DOI : 10.1186/1748-7188-8-1 |
|
received in 2011-12-08, accepted in 2012-12-17, 发布年份 2013 | |
【 摘 要 】
Background
Protein structure alignment is often modeled as the largest common point set (LCP) problem based on the Root Mean Square Deviation (RMSD), a measure commonly used to evaluate structural similarity. In the problem, each residue is represented by the coordinate of the Cαatom, and a structure is modeled as a sequence of 3D points. Out of two such sequences, one is to find two equal-sized subsequences of the maximum length, and a bijection between the points of the subsequences which gives an RMSD within a given threshold. The problem is considered to be difficult in terms of time complexity, but the reasons for its difficulty is not well-understood. Improving this time complexity is considered important in protein structure prediction and structural comparison, where the task of comparing very numerous structures is commonly encountered.
Results
To study why the LCP problem is difficult, we define a natural variant of the problem, called the minimum aligned distance (MAD). In the MAD problem, the length of the subsequences to obtain is specified in the input; and instead of fulfilling a threshold, the RMSD between the points of the two subsequences is to be minimized. Our results show that the difficulty of the two problems does not lie solely in the combinatorial complexity of finding the optimal subsequences, or in the task of superimposing the structures. By placing a limit on the distance between consecutive points, and assuming that the points are specified as integral values, we show that both problems are equally difficult, in the sense that they are reducible to each other. In this case, both problems can be exactly solved in polynomial time, although the time complexity remains high.
Conclusions
We showed insights and techniques which we hope will lead to practical algorithms for the LCP problem for protein structures. The study identified two important factors in the problem’s complexity: (1) The lack of a limit in the distance between the consecutive points of a structure; (2) The arbitrariness of the precision allowed in the input values. Both issues are of little practical concern for the purpose of protein structure alignment. When these factors are removed, the LCP problem is as hard as that of minimizing the RMSD (MAD problem), and can be solved exactly in polynomial time.
【 授权许可】
2013 Li; licensee BioMed Central Ltd.
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
20140705073055944.pdf | 204KB | download |
【 参考文献 】
- [1]Perutz MF, Rossmann MG, Cullis AF, Muirhead H, Will G: North ACT: Structure of myoglobin: a three-dimensional Fourier synthesis at 5.5 Angstrom resolution. Nature 1960, 185:416-422.
- [2]R Kolodny PK, Levitt M: Comprehensive evaluation of protein structure alignment methods: scoring by geometric measures. J Comput Biol 2005, 346(4):1173-1188.
- [3]Sippl MJ: On distance and similarity in fold space. Bioinformatics 2008, 24(6):872-873.
- [4]V A Ilyin CML A Abyzov: Structural alignment of proteins by a novel TOPOFIT method, as a superimposition of common volumes at a topomax point. Protein Sci 2004, 13(7):1865-1874.
- [5]Shibuya T, Jansso J, Sadakane K: Linear-time protein 3-D structure searching with insertions and deletions. Algorithms Mol Biol 2010, 5(7):1-8.
- [6]Bu D, Li M, Li SC, Qian J, Xu J: Finding compact structural motifs. Theor Comput Sci 2009, 410:2834-2839.
- [7]Ambühl C, Chakraborty S, Gärtner B: Computing Largest Common Point Sets under Approximate Congruence. In ESA ’00: Proceedings of the 8th Annual European Symposium on Algorithms, Volume 1876 of Lecture Notes in Computer Science. Saarbrücken, Germany: Springer-Verlag; 2000:52-63.
- [8]Goldman D, Papadimitriou CH, Istrail S: Algorithmic Aspects of Protein Structure Similarity. New York City, NY, USA: IEEE Computer Society; 1999.
- [9]Li SC, Ng YK: On protein structure alignment under distance constraint. In 20th International Symposium on Algorithms and Computation, ISAAC 2009, Proceeedings, Volume 5878 of Lecture Notes in Computer Science. Honolulu, HI, USA: Springer; 2009:65-76.
- [10]Lathrop RH: The Protein Threading Problem With Sequence Amino Acid Interaction Preferences Is NP-Complete. Protein Eng 1995, 7:1059-1068.
- [11]Godzik A: The structural alignment between two proteins: is there a unique answer? Protein Sci 1996, 5(7):1325-1338.
- [12]Eidhammer I, Jonassen I, Taylor WR: Structure Comparison and Structure Patterns. J Comput Biol 2000, 7(5):685-716.
- [13]Zhao Z, Fu B, Alanis FJ, Summa CM: Feedback algorithm and web-server for protein structure alignment. J Comput Biol 2008, 15(5):505-524.
- [14]Kolodny R, Linial N: Approximate Protein Structural Alignment in Polynomial Time. Proc Natl Acad Sci 2004, 101:12201-12206.
- [15]Akutsu T, Tashimo H: Protein structure comparison using representation by line segment sequences. In Proc. Pacific Symposium on Biocomputing ’96. Hawaii, USA: World Scientific Press; 1996:25-40.
- [16]Alexandrov NN: SARFing the PDB. Protein Eng 1996, 9(9):727-732.
- [17]Caprara A, Lancia G: Structural alignment of large-size proteins via lagrangian relaxation. In RECOMB ’02: Proceedings of The Sixth Annual International Conference on Computational Biology. New York, NY, USA: ACM; 2002:100-108.
- [18]Comin M, Guerra C, Zanotti G: PROuST: A Comparison Method of Three-Dimensional Structure of Proteins using Indexing Techniques. J Comp Biol 2004, 11:1061-1072.
- [19]Gerstein M, Levitt M: Using Iterative Dynamic Programming to Obtain Accurate Pairwise and Multiple Alignments of Protein Structures. In Proceedings of the Fourth International Conference on Intelligent Systems for Molecular Biology. St. Louis, MO, USA: AAAI Press; 1996:59-67.
- [20]Gibrat JF, Madej T, Bryant SH: Surprising similarities in structure comparison. Curr Opin Struct Biol 1996, 6(3):377-385.
- [21]Holm L, Sander C: Protein structure comparison by alignment of distance matrices. J Mol Biol 1993, 233:123-138.
- [22]Lancia G, Carr R, Walenz B, Istrail S: 101 optimal PDB structure alignments: a branch-and-cut algorithm for the maximum contact map overlap problem. In RECOMB ’01: Proceedings of The Fifth Annual International Conference on Computational Biology. New York, NY, USA: ACM; 2001:193-202.
- [23]Singh AP, Brutlag DL: Hierarchical Protein Structure Superposition Using Both Secondary Structure and Atomic Representations. In Proceedings of the 5th International Conference on Intelligent Systems for Molecular Biology. Halkidiki, Greece: AAAI Press; 1997:284-293.
- [24]Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE: The protein data bank. Nucl Acids Res 2000, 28:235-242. [http://nar.oxfordjournals.org/cgi/content/abstract/28/1/235 webcite]
- [25]Papadimitriou C: The Euclidean Traveling Salesman Problem is NP-Complete. Theoretial Compututer Sci 1977, 4(3):237-244.
- [26]Arun KS, Huang TS, Blostein SD: Least-squares fitting of two 3-D point sets. IEEE Trans Pattern Anal Mach Intell 1987, 9(5):698-700.
- [27]Ahuja RK, Magnanti TL, Orlin JB: Network Flows: Theory, Algorithms, and Applications. Englewood Cliffs, NJ: Prentice Hall; 1993.
- [28]Caviness B, Johnson J (Eds): A New Algorithm to Find a Point in Every Cell Defined by a Family of Polynomials. Springer Vienna: Springer-Verlag; 1998.
- [29]de Rezende PJ, Lee DT: Point Set Pattern Matching in d-Dimensions. Algorithmica 1995, 13(4):387-404.