| BMC Bioinformatics | |
| Protein structure based prediction of catalytic residues | |
| J Eduardo Fajardo1  Andras Fiser1  | |
| [1] Department of Biochemistry, Albert Einstein College of Medicine, Bronx, USA | |
| 关键词: Structural genomics; Feature selection; Neural network; Catalytic residues; Functional site; | |
| Others : 1087971 DOI : 10.1186/1471-2105-14-63 |
|
| received in 2012-06-21, accepted in 2013-02-17, 发布年份 2013 | |
PDF
|
|
【 摘 要 】
Background
Worldwide structural genomics projects continue to release new protein structures at an unprecedented pace, so far nearly 6000, but only about 60% of these proteins have any sort of functional annotation.
Results
We explored a range of features that can be used for the prediction of functional residues given a known three-dimensional structure. These features include various centrality measures of nodes in graphs of interacting residues: closeness, betweenness and page-rank centrality. We also analyzed the distance of functional amino acids to the general center of mass (GCM) of the structure, relative solvent accessibility (RSA), and the use of relative entropy as a measure of sequence conservation. From the selected features, neural networks were trained to identify catalytic residues. We found that using distance to the GCM together with amino acid type provide a good discriminant function, when combined independently with sequence conservation. Using an independent test set of 29 annotated protein structures, the method returned 411 of the initial 9262 residues as the most likely to be involved in function. The output 411 residues contain 70 of the annotated 111 catalytic residues. This represents an approximately 14-fold enrichment of catalytic residues on the entire input set (corresponding to a sensitivity of 63% and a precision of 17%), a performance competitive with that of other state-of-the-art methods.
Conclusions
We found that several of the graph based measures utilize the same underlying feature of protein structures, which can be simply and more effectively captured with the distance to GCM definition. This also has the added the advantage of simplicity and easy implementation. Meanwhile sequence conservation remains by far the most influential feature in identifying functional residues. We also found that due the rapid changes in size and composition of sequence databases, conservation calculations must be recalibrated for specific reference databases.
【 授权许可】
2013 Fajardo and Fiser; licensee BioMed Central Ltd.
【 预 览 】
| Files | Size | Format | View |
|---|---|---|---|
| 20150117062325740.pdf | 948KB | ||
| Figure 2. | 89KB | Image | |
| Figure 1. | 74KB | Image |
【 图 表 】
Figure 1.
Figure 2.
【 参考文献 】
- [1]Gabanyi MJ, Adams PD, Arnold K, Bordoli L, Carter LG, Flippen-Andersen J, Gifford L, Haas J, Kouranov A, McLaughlin WA: The Structural Biology Knowledgebase: a portal to protein structures, sequences, functions, and methods. J Struct Funct Genomics 2011, 12(2):45-54.
- [2]Lakshminarasimhan D, Eswaramoorthy S, Burley SK, Swaminathan S: Structure of YqgQ protein from Bacillus subtilis, a conserved hypothetical protein. Acta Crystallogr Sect F Struct Biol Cryst Commun 2010, 66(Pt 1):8-11.
- [3]Zhan C, Fedorov EV, Shi W, Ramagopal UA, Thirumuruhan R, Manjasetty BA, Almo SC, Fiser A, Chance MR, Fedorov AA: The ybeY protein from Escherichia coli is a metalloprotein. Acta Crystallogr Sect F Struct Biol Cryst Commun 2005, 61(Pt 11):959-963.
- [4]Gilks WR, Audit B, De Angelis D, Tsoka S, Ouzounis CA: Modeling the percolation of annotation errors in a database of protein sequences. Bioinformatics 2002, 18(12):1641-1649.
- [5]Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 1997, 25(17):3389-3402.
- [6]Jaroszewski L, Rychlewski L, Li Z, Li W, Godzik A: FFAS03: a server for profile--profile sequence alignments. Nucleic Acids Res 2005, 33(Web Server issue):284-288.
- [7]Soding J, Remmert M, Biegert A, Lupas AN: HHsenser: exhaustive transitive profile search using HMM-HMM comparison. Nucleic Acids Res 2006, 34(Web Server issue):374-378.
- [8]Schnoes AM, Brown SD, Dodevski I, Babbitt PC: Annotation error in public databases: misannotation of molecular function in enzyme superfamilies. PLoS Comput Biol 2009, 5(12):e1000605.
- [9]Furnham N, Garavelli JS, Apweiler R, Thornton JM: Missing in action: enzyme functional annotations in biological databases. Nat Chem Biol 2009, 5(8):521-525.
- [10]Valencia A: Automatic annotation of protein function. Curr Opin Struc Biol 2005, 15(3):267-274.
- [11]Rost B: Enzyme function less conserved than anticipated. J Mol Biol 2002, 318(2):595-608.
- [12]Todd AE, Orengo CA, Thornton JM: Evolution of function in protein superfamilies, from a structural perspective. J Mol Biol 2001, 307(4):1113-1143.
- [13]Rost B: Protein structures sustain evolutionary drift. Fold Des 1997, 2(3):S19-S24.
- [14]Furnham N, Sillitoe I, Holliday GL, Cuff AL, Rahman SA, Laskowski RA, Orengo CA, Thornton JM: FunTree: a resource for exploring the functional evolution of structurally defined enzyme superfamilies. Nucleic Acids Res 782, (Database issue):776-782.
- [15]Zhang T, Zhang H, Chen K, Shen S, Ruan J, Kurgan L: Accurate sequence-based prediction of catalytic residues. Bioinformatics 2008, 24(20):2329-2338.
- [16]Fischer JD, Mayer CE, Soding J: Prediction of protein functional residues from sequence by probability density estimation. Bioinformatics 2008, 24(5):613-620.
- [17]Lichtarge O, Bourne HR, Cohen FE: An evolutionary trace method defines binding surfaces common to protein families. J Mol Biol 1996, 257(2):342-358.
- [18]Pupko T, Bell RE, Mayrose I, Glaser F, Ben-Tal N: Rate4Site: an algorithmic tool for the identification of functional regions in proteins by surface mapping of evolutionary determinants within their homologues. Bioinformatics 2002, 18(Suppl 1):S71-S77.
- [19]Sankararaman S, Sjolander K: INTREPID–INformation-theoretic TREe traversal for Protein functional site IDentification. Bioinformatics 2008, 24(21):2445-2452.
- [20]Wangikar PP, Tendulkar AV, Ramya S, Mali DN, Sarawagi S: Functional sites in protein families uncovered via an objective and automated graph theoretic approach. J Mol Biol 2003, 326(3):955-978.
- [21]Stark A, Sunyaev S, Russell RB: A model for statistical significance of local similarities in structure. JMolBiol 2003, 326(5):1307.
- [22]Barker JA, Thornton JM: An algorithm for constraint-based structural template matching: application to 3D templates with statistical analysis. Bioinformatics 2003, 19(13):1644.
- [23]Amitai G, Shemesh A, Sitbon E, Shklar M, Netanely D, Venger I, Pietrokovski S: Network analysis of protein structures identifies functional residues. J Mol Biol 2004, 344(4):1135-1146.
- [24]Sobolev V, Sorokine A, Prilusky J, Abola EE, Edelman M: Automated analysis of interatomic contacts in proteins. Bioinformatics 1999, 15(4):327-332.
- [25]Sacquin-Mora S, Laforet E, Lavery R: Locating the active sites of enzymes using mechanical properties. Proteins 2007, 67(2):350-359.
- [26]Laskowski RA, Luscombe NM, Swindells MB, Thornton JM: Protein clefts in molecular recognition and function. Protein Sci 1996, 5(12):2438-2452.
- [27]Liang J, Edelsbrunner H, Woodward C: Anatomy of protein pockets and cavities: measurement of binding site geometry and implications for ligand design. Protein Sci 1998, 7(9):1884-1897.
- [28]Tan KP, Varadarajan R, Madhusudhan MS: DEPTH: a web server to compute depth and predict small-molecule binding cavities in proteins. Nucleic Acids Res 39(Web Server issue):242-248.
- [29]Ko J, Murga LF, Andre P, Yang H, Ondrechen MJ, Williams RJ, Agunwamba A, Budil DE: Statistical criteria for the identification of protein active sites using Theoretical Microscopic Titration Curves. Proteins 2005, 59(2):183-195.
- [30]Thibert B, Bredesen DE, del Rio G: Improved prediction of critical residues for protein function based on network and phylogenetic analyses. BMC Bioinforma 2005, 6:213. BioMed Central Full Text
- [31]Slama P, Filippis I, Lappe M: Detection of protein catalytic residues at high precision using local network properties. BMC Bioinforma 2008, 9:517. BioMed Central Full Text
- [32]Petrova NV, Wu CH: Prediction of catalytic residues using Support Vector Machine with selected protein sequence and structural properties. BMC Bioinforma 2006, 7:312. BioMed Central Full Text
- [33]Cilia E, Passerini A: Automatic prediction of catalytic residues by modeling residue structural neighborhood. BMC Bioinforma 2010, 11:115. BioMed Central Full Text
- [34]Tong W, Wei Y, Murga LF, Ondrechen MJ, Williams RJ: Partial order optimum likelihood (POOL): maximum likelihood prediction of protein active site residues using 3D Structure and sequence properties. PLoS Comput Biol 2009, 5(1):e1000266.
- [35]Ben-Shimon A, Eisenstein M: Looking at enzymes from the inside out: the proximity of catalytic residues to the molecular centroid can be used for detection of active sites and enzyme-ligand interfaces. J Mol Biol 2005, 351(2):309-326.
- [36]Porter CT, Bartlett GJ, Thornton JM: The Catalytic Site Atlas: a resource of catalytic sites and residues identified in enzymes using structural data. Nucleic Acids Res 2004, 32:D129-D133.
- [37]Capra JA, Singh M: Predicting functionally important residues from sequence conservation. Bioinformatics 2007, 23(15):1875-1882.
- [38]Fiser A, Simon I, Barton GJ: Conservation of amino acids in multiple alignments: aspartic acid has unexpected conservation. FEBS Lett 1996, 397(2–3):225-229.
- [39]Youn E, Peters B, Radivojac P, Mooney SD: Evaluation of features for catalytic residue prediction in novel folds. Protein Sci 2007, 16(2):216-226.
- [40]Mooney SD, Liang MHP, DeConde R, Altman RB: Structural characterization of proteins using residue environments. Proteins-Structure Function and Bioinformatics 2005, 61(4):741-747.
- [41]Suzek BE, Huang H, McGarvey P, Mazumder R, Wu CH: UniRef: comprehensive and non-redundant UniProt reference clusters. Bioinformatics 2007, 23(10):1282-1288.
- [42]Schwartz SD, Schramm VL: Enzymatic transition states and dynamic motion in barrier crossing. Nat Chem Biol 2009, 5(8):551-558.
- [43]Bartlett GJ, Porter CT, Borkakoti N, Thornton JM: Analysis of catalytic residues in enzyme active sites. J Mol Biol 2002, 324(1):105-121.
- [44]Bork P, Sander C, Valencia A: Convergent evolution of similar enzymatic function on different protein folds: the hexokinase, ribokinase, and galactokinase families of sugar kinases. Protein Sci 1993, 2(1):31-40.
- [45]Antoniou D, Basner J, Nunez S, Schwartz SD: Computational and theoretical methods to explore the relation between enzyme dynamics and catalysis. Chem Rev 2006, 106(8):3170-3187.
- [46]Gunasekaran K, Ma B, Nussinov R: Triggering loops and enzyme function: identification of loops that trigger and modulate movements. J Mol Biol 2003, 332(1):143-159.
- [47]Lockless SW, Ranganathan R: Evolutionarily conserved pathways of energetic connectivity in protein families. Science 1999, 286(5438):295-299.
- [48]Li WZ, Jaroszewski L, Godzik A: Tolerating some redundancy significantly speeds up clustering of large protein databases. Bioinformatics 2002, 18(1):77-82.
- [49]Brin S, Page L: The anatomy of a large-scale hypertextual Web search engine. Comput Networks Isdn 1998, 30(1–7):107-117.
- [50]Hubbard SJ, Thornton JM: 'NACESS'. In Computer Program. Department of Biochemistry and Molecuar Biology, University College London; 1993.
- [51]Schaffer AA, Aravind L, Madden TL, Shavirin S, Spouge JL, Wolf YI, Koonin EV, Altschul SF: Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements. Nucleic Acids Res 2001, 29(14):2994.
- [52]Rai BK, Madrid-Aliste CJ, Fajardo JE, Fiser A: MMM: a sequence-to-structure alignment protocol. Bioinformatics 2006, 22(21):2691-2692.
- [53]Johansson F, Toh H: A comparative study of conservation and variation scores. BMC Bioinforma 2010, 11:388. BioMed Central Full Text
- [54]Wang K, Samudrala R: Incorporating background frequency improves entropy-based residue conservation measures. BMC Bioinforma 2006, 7:385. BioMed Central Full Text
- [55]Rumelhart DE, Hinton GE, Williams RJ: Learning Representations by Back-Propagating Errors. Nature 1986, 323(6088):533-536.
PDF