JOURNAL OF THEORETICAL BIOLOGY | 卷:312 |
Comprehensive comparative analysis and identification of RNA-binding protein domains: Multi-class classification and feature selection | |
Article | |
Jahandideh, Samad1  Srinivasasainagendra, Vinodh1  Zhi, Degui1  | |
[1] Univ Alabama Birmingham, Dept Biostat, Sect Stat Genet, Birmingham, AL 35294 USA | |
关键词: RNA-binding domain; Tuned multi-class SVM; Random Forest; Multi-class l(1)/l(q)-regularized logistic regression; Prediction; | |
DOI : 10.1016/j.jtbi.2012.07.013 | |
来源: Elsevier | |
【 摘 要 】
RNA-protein interaction plays an important role in various cellular processes, such as protein synthesis, gene regulation, post-transcriptional gene regulation, alternative splicing, and infections by RNA viruses. In this study, using Gene Ontology Annotated (GOA) and Structural Classification of Proteins (SCOP) databases an automatic procedure was designed to capture structurally solved RNA-binding protein domains in different subclasses. Subsequently, we applied tuned multi-class SVM (TMCSVM), Random Forest (RF), and multi-class l(1)/l(q)-regularized logistic regression (MCRLR) for analysis and classifying RNA-binding protein domains based on a comprehensive set of sequence and structural features. In this study, we compared prediction accuracy of three different state-of-the-art predictor methods. From our results, TMCSVM outperforms the other methods and suggests the potential of TMCSVM as a useful tool for facilitating the multi-class prediction of RNA-binding protein domains. On the other hand, MCRLR by elucidating importance of features for their contribution in predictive accuracy of RNA-binding protein domains subclasses, helps us to provide some biological insights into the roles of sequences and structures in protein-RNA interactions. Published by Elsevier Ltd.
【 授权许可】
Free
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
10_1016_j_jtbi_2012_07_013.pdf | 359KB | download |