| BMC Bioinformatics | |
| Disentangling evolutionary signals: conservation, specificity determining positions and coevolution. Implication for catalytic residue prediction | |
| Elin Teppa3  Angela D Wilkins2  Morten Nielsen1  Cristina Marino Buslje3  | |
| [1] Instituto de Investigaciones Biotecnológicas, Universidad de San Martín, San Martín, B 1650 HMP, Buenos Aires, Argentina | |
| [2] Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas | |
| [3] Fundación Instituto Leloir, Avda. Patricias Argentinas 435, CABA, C1405BWE, Argentina | |
| 关键词: Sequence analysis; Functional sites; Catalytic residues; Specificity determining position; Mutual information; Coevolution; | |
| Others : 1088135 DOI : 10.1186/1471-2105-13-235 |
|
| received in 2012-05-08, accepted in 2012-09-05, 发布年份 2012 | |
PDF
|
|
【 摘 要 】
Background
A large panel of methods exists that aim to identify residues with critical impact on protein function based on evolutionary signals, sequence and structure information. However, it is not clear to what extent these different methods overlap, and if any of the methods have higher predictive potential compared to others when it comes to, in particular, the identification of catalytic residues (CR) in proteins. Using a large set of enzymatic protein families and measures based on different evolutionary signals, we sought to break up the different components of the information content within a multiple sequence alignment to investigate their predictive potential and degree of overlap.
Results
Our results demonstrate that the different methods included in the benchmark in general can be divided into three groups with a limited mutual overlap. One group containing real-value Evolutionary Trace (rvET) methods and conservation, another containing mutual information (MI) methods, and the last containing methods designed explicitly for the identification of specificity determining positions (SDPs): integer-value Evolutionary Trace (ivET), SDPfox, and XDET. In terms of prediction of CR, we find using a proximity score integrating structural information (as the sum of the scores of residues located within a given distance of the residue in question) that only the methods from the first two groups displayed a reliable performance. Next, we investigated to what degree proximity scores for conservation, rvET and cumulative MI (cMI) provide complementary information capable of improving the performance for CR identification. We found that integrating conservation with proximity scores for rvET and cMI achieved the highest performance. The proximity conservation score contained no complementary information when integrated with proximity rvET. Moreover, the signal from rvET provided only a limited gain in predictive performance when integrated with mutual information and conservation proximity scores. Combined, these observations demonstrate that the rvET and cMI scores add complementary information to the prediction system.
Conclusions
This work contributes to the understanding of the different signals of evolution and also shows that it is possible to improve the detection of catalytic residues by integrating structural and higher order sequence evolutionary information with sequence conservation.
【 授权许可】
2012 Teppa et al.; licensee BioMed Central Ltd.
【 预 览 】
| Files | Size | Format | View |
|---|---|---|---|
| 20150117080311896.pdf | 458KB | ||
| Figure 2. | 32KB | Image | |
| Figure 1. | 90KB | Image |
【 图 表 】
Figure 1.
Figure 2.
【 参考文献 】
- [1]Porter CT, Bartlett GJ, Thornton JM, The Catalytic Site Atlas: Nucleic Acids Res. 2004, 32:129-133. Database issue
- [2]Oliveira L W, Vriend G, Ljzerman AP: Identification of class-determining residues in G protein-coupled receptors by sequence analysis. Receptors Channels. 5th edition. 1997, 5(3-4):159-174.
- [3]Pirovano W, Feenstra KA, Heringa J: Sequence comparison by sequence harmony identifies subtype-specific functional sites. Nucleic Acids Res 2006, 34(22):6540-6548.
- [4]Chakrabarti S, Panchenko AR: Coevolution in defining the functional specificity. Proteins 2009, 75:231-240.
- [5]Casari G, Sander C, Valencia A: A method to predict functional residues in proteins. Nat Struct Mol Biol 1995, 2(2):171-178.
- [6]Hannenhalli SS, Russell RB: Analysis and prediction of functional sub-types from protein sequence alignments. J Mol Biol 2000, 303(1):61-76.
- [7]Brown DP, Krishnamurthy N, Sjolander K: Automated protein subfamily identification and classification. PLoS Comput Biol 2007, 3:e160.
- [8]Wicker N, et al.: Secator: A Program for Inferring Protein Subfamilies from Phylogenetic Trees. Mol Biol Evol 2001, 18(8):1435-1441.
- [9]Capra JA, Singh M: Characterization and prediction of residues determining protein functional specificity. Bioinformatics 2008, 24:1473-1480.
- [10]Mazin P, et al.: An automated stochastic approach to the identification of the protein specificity determinants and functional subfamilies. Algorithms for Molecular Biology 2010, 5(1):29. BioMed Central Full Text
- [11]Marttinen P, et al.: Bayesian search of functionally divergent protein subgroups and their function specific residues. Bioinformatics 2006, 22:2466-2474.
- [12]Lichtarge O, Bourne HR, Cohen FE: An Evolutionary Trace Method Defines Binding Surfaces Common to Protein Families. J Mol Biol 1996, 257(2):342-358.
- [13]Mihalek I, Res I, Lichtarge O: A Family of Evolution-Entropy Hybrid Methods for Ranking Protein Residues by Importance. J Mol Biol 2004, 336(5):1265-1282.
- [14]Pei J, et al.: Prediction of functional specificity determinants from protein sequences using log-likelihood ratios. Bioinformatics 2006, 22:164-171.
- [15]Ye K, Vriend G, Ijzerman AP: Tracing evolutionary pressure. Bioinformatics 2008, 24(7):908-915.
- [16]Marino Buslje C, et al.: Networks of High Mutual Information Define the Structural Proximity of Catalytic Sites: Implications for Catalytic Residue Identification. PLoS Comput Biol 2010, 6(11):e1000978.
- [17]Morgan DH, et al.: ET viewer: an application for predicting and visualizing functional sites in protein structures. Bioinformatics 2006, 22(16):2049-2050.
- [18]Sankararaman S, Sjolander K: INTREPID - INformation-theoretic TREe traversal for Protein functional site IDentification. Bioinformatics 2008, 24:2445-2452.
- [19]Pazos F, Rausell A, Valencia A: Phylogeny-independent detection of functional residues. Bioinformatics 2006, 22(12):1440-1448.
- [20]Finn RD, et al.: The Pfam protein families database. Nucleic Acids Res 2010, 38(suppl 1):D211-D222.
- [21]Ye K, et al.: Multi-RELIEF: a method to recognize specificity determining residues from multiple sequence alignments using a Machine-Learning approach for feature weighting. Bioinformatics 2008, 24:18-25.
- [22]Chakrabarti S, Panchenko A: Ensemble approach to predict specificity determinants: benchmarking and validation. BMC Bioinforma 2009, 10(1):207. BioMed Central Full Text
- [23]Kalinina OV, et al.: Automated selection of positions determining functional specificity of proteins by comparative analysis of orthologous groups in protein families. Protein Sci 2004, 13(2):443-456.
- [24]Rodriguez GJ, et al.: Evolution-guided discovery and recoding of allosteric pathway specificity determinants in psychoactive bioamine receptors. Proc Natl Acad Sci 2010, 107(17):7787-7792.
- [25]Notredame C, Higgins DG, Heringa J: T-coffee: a novel method for fast and accurate multiple sequence alignment. J Mol Biol 2000, 302(1):205-217.
- [26]del Sol Mesa A, Pazos F, Valencia A: Automatic Methods for Predicting Functionally Important Residues. J Mol Biol 2003, 326(4):1289-1302.
- [27]Kullback S, Leibler R: On Information and Sufficiency. Ann. Math. Statist 1951, 22(1):7.
- [28]Stranzl T, et al.: NetCTLpan: pan-specific MHC class I pathway epitope predictions. Immunogenetics 2010, 62(6):357-368.
PDF