BMC Bioinformatics | |
MFSPSSMpred: identifying short disorder-to-order binding regions in disordered proteins based on contextual local evolutionary conservation | |
Chun Fang3  Tamotsu Noguchi2  Daisuke Tominaga3  Hayato Yamana1  | |
[1] Department of Computer Science and Engineering, Waseda University, Tokyo, Japan | |
[2] Meiji Pharmaceutical University, Tokyo, Japan | |
[3] Computational Biology Research Center (CBRC), Tokyo, Japan | |
关键词: Position-specific scoring matrix; Intrinsically disordered protein; Molecular recognition features; | |
Others : 1087736 DOI : 10.1186/1471-2105-14-300 |
|
received in 2013-03-16, accepted in 2013-10-01, 发布年份 2013 | |
【 摘 要 】
Background
Molecular recognition features (MoRFs) are short binding regions located in longer intrinsically disordered protein regions. Although these short regions lack a stable structure in the natural state, they readily undergo disorder-to-order transitions upon binding to their partner molecules. MoRFs play critical roles in the molecular interaction network of a cell, and are associated with many human genetic diseases. Therefore, identification of MoRFs is an important step in understanding functional aspects of these proteins and in finding applications in drug design.
Results
Here, we propose a novel method for identifying MoRFs, named as MFSPSSMpred (Masked, Filtered and Smoothed Position-Specific Scoring Matrix-based Predictor). Firstly, a masking method is used to calculate the average local conservation scores of residues within a masking-window length in the position-specific scoring matrix (PSSM). Then, the scores below the average are filtered out. Finally, a smoothing method is used to incorporate the features of flanking regions for each residue to prepare the feature sets for prediction. Our method employs no predicted results from other classifiers as input, i.e., all features used in this method are extracted from the PSSM of sequence only. Experimental results show that, comparing with other methods tested on the same datasets, our method achieves the best performance: achieving 0.004~0.079 higher AUC than other methods when tested on TEST419, and achieving 0.045~0.212 higher AUC than other methods when tested on TEST2012. In addition, when tested on an independent membrane proteins-related dataset, MFSPSSMpred significantly outperformed the existing predictor MoRFpred.
Conclusions
This study suggests that: 1) amino acid composition and physicochemical properties in the flanking regions of MoRFs are very different from those in the general non-MoRF regions; 2) MoRFs contain both highly conserved residues and highly variable residues and, on the whole, are highly locally conserved; and 3) combining contextual information with local conservation information of residues facilitates the prediction of MoRFs.
【 授权许可】
2013 Fang et al.; licensee BioMed Central Ltd.
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
20150117035637536.pdf | 1690KB | download | |
Figure 13. | 73KB | Image | download |
Figure 12. | 53KB | Image | download |
Figure 11. | 47KB | Image | download |
Figure 10. | 87KB | Image | download |
Figure 9. | 53KB | Image | download |
Figure 8. | 96KB | Image | download |
Figure 7. | 87KB | Image | download |
Figure 6. | 41KB | Image | download |
Figure 5. | 97KB | Image | download |
Figure 4. | 174KB | Image | download |
Figure 3. | 50KB | Image | download |
Figure 2. | 50KB | Image | download |
Figure 1. | 71KB | Image | download |
【 图 表 】
Figure 1.
Figure 2.
Figure 3.
Figure 4.
Figure 5.
Figure 6.
Figure 7.
Figure 8.
Figure 9.
Figure 10.
Figure 11.
Figure 12.
Figure 13.
【 参考文献 】
- [1]Mohan A, Oldfield CJ, Radivojac P, Vacic V, Cortese MS, Dunker AK, Uversky VN: Analysis of molecular recognition features (MoRFs). J Mol Biol 2006, 362:1043-1059.
- [2]Vacic V, Christopher JO, Amrita M, Predrag R, Marc SC, Vladimir NU, Dunker AK: Characterization of molecular recognition features, MoRFs, and their binding partners. J Proteome Res 2007, 6(6):2351-2366.
- [3]Norman ED, Kim VR, Robert JW: Attributes of short linear motifs. Mol Biosyst 2012, 8:268-281.
- [4]Fatemeh MD, Wei-Lun H, Marcin JM, Christopher JO, Bin X, Dunker AK, Vladimir NU, Lukasz K: MoRFpred, a computational tool for sequence-based prediction and characterization of short disorder-to-order transitioning binding regions in proteins. Bioinformatics 2012, 28(12):i75-i83.
- [5]Oldfield CJ, Cheng Y, Cortese MS, Romero P, Uversky VN, Dunker AK: Coupled folding and binding with alpha-helix-forming molecular recognition elements. Biochemistry 2005, 44:12454-12470.
- [6]Yugong C, Christopher JO, Jingwei M, Pedro R, Vladimir NU, Dunker AK: Mining α-helix-forming molecular recognition features with cross species sequence alignments. Biochemistry 2007, 46(47):13468-13477.
- [7]Dosztanyi Z, Mészáros SI: ANCHOR: web server for predicting protein binding regions in disordered proteins. Bioinformatics 2009, 25(20):2745-2746.
- [8]Fuxreiter M, Peter T, Istvan S: Local structural disorder imparts plasticity on linear motifs. Bioinformatics 2007, 23(8):950-956.
- [9]Chica C, Diella F, Gibson TJ: Evidence for the concerted evolution between short linear protein motifs and their flanking regions. PLoS ONE 2009, 4(7):e6052.
- [10]Norman ED, Denis CS, Richard JE: Masking residues using context-specific evolutionary conservation significantly improves short linear motif discovery. Bioinformatics 2009, 25(4):443-450.
- [11]Norman ED, Joanne LC, Denis CS, Toby JG, Mark JC, Richard JE: SLiMPrints: conservation-based discovery of functional motif fingerprints in intrinsically disordered protein regions. Nucleic Acids Res 2012, September, 12:1-14.
- [12]Niall JH, Denis CS: Profile-based short linear protein motif discovery. BMC Bioinformatics 2012, 13:104. BioMed Central Full Text
- [13]Shimizu K, Hirose S, Tamotsu N: POODLE-S: web application for predicting protein disorder by using physicochemical features and reduced amino acid set of a position-specific scoring matrix. Bioinformatics 2007, 23(17):2337-2338.
- [14]Ioly KL, Georgios NT, Stavros JH: Analysis of molecular recognition features (MoRFs) in membrane proteins. Biochimica et Biophysica Acta (BBA) - Proteins and Proteomics 2013, 1834(4):798-807.
- [15]Stephen FA, Thomas LM, Alejandro AS, Jinghui Z, Zheng Z: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 1997, 25(17):3389-3402.
- [16]NR. ftp://ftp.ncbi.nih.gov/blast/db/fasta/nr.gz
- [17]Cheng-Wei C, Emily CYS, Jenn-Kang H, Ting-Yi S, Wen-Lian H: Predicting RNA-binding sites of proteins using support vector machines and evolutionary information. BMC Bioinformatics 2008, 9(Suppl 12):S6. BioMed Central Full Text
- [18]Gonzalez RC, Woods RE: Digital Image Processing. The Second Edition. Prentice Hall; 2002.
- [19]Chih-Chung C, Chih-Jen L: LIBSVM: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2011, 2(3):27.
- [20][http://www.csie.ntu.edu.tw/~cjlin/libsvm/] webciteA library for support vector machines.
- [21][http://predictioncenter.org/casp10/index.cgi] webciteCASP10.
- [22][http://www.r-project.org/] webciteR statistical package.
- [23]Avner S, Marco P, Guy Y, Laszlo K, Burkhard R: Improved disorder prediction by combination of orthogonal approaches. PLoS One 2009, 4(2):e4433.
- [24]Zsuzsanna D, Veronika C, Peter T, István S: IUPred: web server for the prediction of intrinsically unstructured regions of proteins based on estimated energy content. Bioinformatics 2005, 21(16):3433-3434.
- [25]Marcin JM, Wojciech S, Ke C, Kanaka DK, Fatemeh MD, Lukasz K: Improved sequence-based prediction of disordered regions with multilayer fusion of multiple information sources. Bioinformatics 2010, 26(18):i489-i496.
- [26]Tuo Z, Eshel F, Bin X, Dunker AK, Vladimir NU, Yaoqi Z: SPINE-D: accurate prediction of short and long disordered regions by a single neural-network based method. Journal of Biomolecular Structure and Dynamics 2012, 4(29):799-813.
- [27]Jonathan JW, Liam JM, Kevin B, Bernard FB, David TJ: The DISOPRED server for the prediction of protein disorder. Bioinformatics 2004, 20(13):2138-2139.
- [28]McGuffin L: Intrinsic disorder prediction from the analysis of multiple protein fold recognition models. Bioinformatics 2008, 24:1798-1804.