BMC Bioinformatics | |
TMFoldRec: a statistical potential-based transmembrane protein fold recognition tool | |
Dániel Kozma1  Gábor E. Tusnády1  | |
[1] “Momentum” Membrane Protein Bioinformatics Research Group, Institute of Enzymology, Research Centre for Natural Sciences, Hungarian Academy of Sciences, Budapest, H 1518, Hungary | |
关键词: Threading; Fold recognition; Statistical potential; Transmembrane protein; | |
Others : 1231824 DOI : 10.1186/s12859-015-0638-5 |
|
received in 2015-02-25, accepted in 2015-06-06, 发布年份 2015 |
【 摘 要 】
Background
Transmembrane proteins (TMPs) are the key components of signal transduction, cell-cell adhesion and energy and material transport into and out from the cells. For the deep understanding of these processes, structure determination of transmembrane proteins is indispensable. However, due to technical difficulties, only a few transmembrane protein structures have been determined experimentally. Large-scale genomic sequencing provides increasing amounts of sequence information on the proteins and whole proteomes of living organisms resulting in the challenge of bioinformatics; how the structural information should be gained from a sequence.
Results
Here, we present a novel method, TMFoldRec, for fold prediction of membrane segments in transmembrane proteins. TMFoldRec based on statistical potentials was tested on a benchmark set containing 124 TMP chains from the PDBTM database. Using a 10-fold jackknife method, the native folds were correctly identified in 77 % of the cases. This accuracy overcomes the state-of-the-art methods. In addition, a key feature of TMFoldRec algorithm is the ability to estimate the reliability of the prediction and to decide with an accuracy of 70 %, whether the obtained, lowest energy structure is the native one.
Conclusion
These results imply that the membrane embedded parts of TMPs dictate the TM structures rather than the soluble parts. Moreover, predictions with reliability scores make in this way our algorithm applicable for proteome-wide analyses.
Availability
The program is available upon request for academic use.
【 授权许可】
2015 Kozma and Tusnády.
【 参考文献 】
- [1]Rust S, Rosier M, Funke H, Real J, Amoura Z, Piette JC, Deleuze JF, Brewer HB, Duverger N, Denèfle P, Assmann G. Tangier disease is caused by mutations in the gene encoding ATP-binding cassette transporter 1. Nat Genet. 1999; 22:352-5.
- [2]Stefková J, Poledne R, Hubácek JA. ATP-binding cassette (ABC) transporters in human metabolism and diseases. Physiol Res. 2004; 53:235-43.
- [3]Tarling EJ, de Aguiar Vallim TQ, Edwards PA. Role of ABC transporters in lipid transport and human disease. Trends Endocrinol Metab. 2013; 24:342-50.
- [4]Palmieri F. Diseases caused by defects of mitochondrial carriers: a review. Biochim Biophys Acta. 2008; 1777:564-78.
- [5]Dorwart MR, Shcheynikov N, Yang D, Muallem S. The solute carrier 26 family of proteins in epithelial ion transport. Physiology (Bethesda). 2008; 23:104-14.
- [6]Ashcroft FM: Ion Channels and Disease. Academic Press; San Diego, California. 1999.
- [7]Amin AS, Tan HL, Wilde AAM. Cardiac ion channels in health and disease. Heart Rhythm. 2010; 7:117-26.
- [8]Insel PA, Tang C-M, Hahntow I, Michel MC. Impact of GPCRs in clinical medicine: monogenic diseases, genetic variants and drug targets. Biochim Biophys Acta. 2007; 1768:994-1005.
- [9]Schöneberg T, Schulz A, Biebermann H, Hermsdorf T, Römpler H, Sangkuhl K. Mutant G-protein-coupled receptors as a cause of human diseases. Pharmacol Ther. 2004; 104:173-206.
- [10]Ng DP, Poulsen BE, Deber CM. Membrane protein misassembly in disease. Biochim Biophys Acta. 2012; 1818:1115-22.
- [11]Hopkins AL, Groom CR. The druggable genome. Nat Rev Drug Discov. 2002; 1:727-30.
- [12]Kalyaanamoorthy S, Chen Y-PP. Structure-based drug design to augment hit discovery. Drug Discov Today. 2011; 16:831-9.
- [13]Fiser A, Sali A. Modeller: generation and refinement of homology-based protein structure models. Methods Enzymol. 2003; 374:461-91.
- [14]Webb B, Sali A, Eswar N, Webb B, Marti-Renom MA, Madhusudhan MS, Eramian D, Shen M-Y, Pieper U, Sali A: Comparative protein structure modeling using Modeller. Curr Protoc Bioinformatics 2014, Chapter 5:Unit 5.6.
- [15]Canutescu AA, Dunbrack RL. MollDE: a homology modeling framework you can click with. Bioinformatics. 2005; 21:2914-6.
- [16]Arnold K, Bordoli L, Kopp J, Schwede T. The SWISS-MODEL workspace: a web-based environment for protein structure homology modelling. Bioinformatics. 2006; 22:195-201.
- [17]Söding J. Protein homology detection by HMM-HMM comparison. Bioinformatics. 2005; 21:951-60.
- [18]Remmert M, Biegert A, Hauser A, Söding J. HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment. Nat Methods. 2012; 9:173-5.
- [19]Bradley P, Misura KMS, Baker D. Toward high-resolution de novo structure prediction for small proteins. Science. 2005; 309:1868-71.
- [20]Wang Z, Xu J. Predicting protein contact map using evolutionary and physical constraints by integer programming. Bioinformatics. 2013; 29:i266-i273.
- [21]Sadowski MI, Maksimiak K, Taylor WR. Direct correlation analysis improves fold recognition. Comput Biol Chem. 2011; 35:323-32.
- [22]Barth P, Schonbrun J, Baker D. Toward high-resolution prediction and design of transmembrane helical protein structures. Proc Natl Acad Sci U S A. 2007; 104:15682-7.
- [23]Nugent T, Jones DT. Accurate de novo structure prediction of large transmembrane protein domains using fragment-assembly and correlated mutation analysis. Proc Natl Acad Sci U S A. 2012; 109:E1540-7.
- [24]Morcos F, Jana B, Hwa T, Onuchic JN. Coevolutionary signals across protein lineages help capture multiple protein conformations. Proc Natl Acad Sci U S A. 2013; 110:20533-8.
- [25]Yarov-Yarovoy V, Schonbrun J, Baker D. Multipass membrane protein structure prediction using Rosetta. Proteins. 2006; 62:1010-25.
- [26]Weiner BE, Woetzel N, Karakaş M, Alexander N, Meiler J. BCL::MP-fold: folding membrane proteins through assembly of transmembrane helices. Structure. 2013; 21:1107-17.
- [27]Oberai A, Ihm Y, Kim S, Bowie JU. A limited universe of membrane protein families and folds. Protein Sci. 2006; 15:1723-34.
- [28]Finn RD, Clements J, Eddy SR. HMMER web server: interactive sequence similarity searching. Nucleic Acids Res. 2011; 39(Web Server issue):W29-37.
- [29]Wang H, He Z, Zhang C, Zhang L, Xu D. Transmembrane protein alignment and fold recognition based on predicted topology. PLoS One. 2013; 8:e69744.
- [30]Lobley A, Sadowski MI, Jones DT. pGenTHREADER and pDomTHREADER: new methods for improved protein fold recognition and superfamily discrimination. Bioinformatics. 2009; 25:1761-7.
- [31]Tusnády GE, Kalmár L, Hegyi H, Tompa P, Simon I. TOPDOM: database of domains and motifs with conservative location in transmembrane proteins. Bioinformatics. 2008; 24:1469-70.
- [32]Dobson L, Langó T, Reményi I, Tusnády GE: Expediting topology data gathering for the TOPDB database. Nucleic Acids Res 2015;43(Database issue): D285-D289.
- [33]Kozma D, Simon I, Tusnády GE: PDBTM: Protein Data Bank of transmembrane proteins after 8 years. Nucleic Acids Res 2013;41(Database issue): D524-529.
- [34]Tusnády GE, Dosztányi Z, Simon I. PDB_TM: selection and membrane localization of transmembrane proteins in the protein data bank. Nucleic Acids Res. 2005; 33(Database issue):D275-8.
- [35]Tusnády GE, Dosztányi Z, Simon I. Transmembrane proteins in the Protein Data Bank: identification and classification. Bioinformatics. 2004; 20:2964-72.
- [36]Zhang Y, Skolnick J. TM-align: a protein structure alignment algorithm based on the TM-score. Nucleic Acids Res. 2005; 33:2302-9.
- [37]Sillitoe I, Cuff AL, Dessailly BH, Dawson NL, Furnham N, Lee D, Lees JG, Lewis TE, Studer RA, Rentzsch R, Yeats C, Thornton JM, Orengo CA. New functional families (FunFams) in CATH to improve the mapping of conserved functional sites to 3D structures. Nucleic Acids Res. 2013; 41(Database issue):D490-8.
- [38]Thomas PD, Dill KA. An iterative method for extracting energy-like quantities from protein structures. Proc Natl Acad Sci. 1996; 93:11628-11633.
- [39]Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997; 25:3389-402.
- [40]Suzek BE, Huang H, McGarvey P, Mazumder R, Wu CH. UniRef: comprehensive and non-redundant UniProt reference clusters. Bioinformatics. 2007; 23:1282-8.
- [41]Tusnády GE, Dosztányi Z, Simon I. TMDET: web server for detecting transmembrane regions of proteins by using their 3D coordinates. Bioinformatics. 2005; 21:1276-1277.
- [42]Dosztányi Z, Csizmók V, Tompa P, Simon I. The pairwise energy content estimated from amino acid composition discriminates between folded and intrinsically unstructured proteins. J Mol Biol. 2005; 347:827-39.
- [43]Ray A, Lindahl E, Wallner B. Model quality assessment for membrane proteins. Bioinformatics. 2010; 26:3067-74.
- [44]Heim AJ, Li Z. Developing a high-quality scoring function for membrane protein structures based on specific inter-residue interactions. J Comput Aided Mol Des. 2012; 26:301-9.
- [45]Studer G, Biasini M, Schwede T. Assessing the local structural quality of transmembrane protein models using statistical potentials (QMEANBrane). Bioinformatics. 2014; 30:i505-i511.
- [46]Fischer D, Eisenberg D. Protein fold recognition using sequence-derived predictions. Protein Sci. 1996; 5:947-55.
- [47]Torda AE: Protein Threading. In Proteomics Protoc Handb SE - 70. Edited by Walker J. Humana Press; New York City 2005:921–938.
- [48]Eddy SR. Accelerated Profile HMM Searches. PLoS Comput Biol. 2011; 7:e1002195.
- [49]Peng J, Xu J. Low-homology protein threading. Bioinformatics. 2010; 26:i294-300.
- [50]Ma J, Peng J, Wang S, Xu J. A conditional neural fields model for protein threading. Bioinformatics. 2012; 28:i59-66.
- [51]Ma J, Wang S, Zhao F, Xu J. Protein threading using context-specific alignment potential. Bioinformatics. 2013; 29:i257-i265.
- [52]Finn RD, Mistry J, Schuster-Böckler B, Griffiths-Jones S, Hollich V, Lassmann T, Moxon S, Marshall M, Khanna A, Durbin R, Eddy SR, Sonnhammer ELL, Bateman A. Pfam: clans, web tools and services. Nucleic Acids Res. 2006; 34(Database issue):D247-51.
- [53]Punta M, Coggill PC, Eberhardt RY, Mistry J, Tate J, Boursnell C, Pang N, Forslund K, Ceric G, Clements J, Heger A, Holm L, Sonnhammer ELL, Eddy SR, Bateman A, Finn RD. The Pfam protein families database. Nucleic Acids Res. 2012; 40(Database issue):D290-301.