| IEEE Access | |
| Improved DNA-Binding Protein Identification by Incorporating Evolutionary Information Into the Chou’s PseAAC | |
| Bo Liao1  Xiangzheng Fu1  Wen Zhu1  Lijun Cai1  Lihong Peng2  Jialiang Yang3  | |
| [1] College of Information Science and Engineering, Hunan University, Changsha, China;School of Computer Science, Hunan University of Technology, Zhuzhou, China;School of Mathematics and Statistics, Hainan Normal University, Haikou, China; | |
| 关键词: DNA-binding protein identification; feature representation algorithm; evolutionary information; support vector machine; | |
| DOI : 10.1109/ACCESS.2018.2876656 | |
| 来源: DOAJ | |
【 摘 要 】
DNA-binding proteins play critical roles in various cellular biological processes, such as gene expression and transcription. However, the experimental methods to identify these proteins like ChIP-sequencing are expensive and time-consuming, which presents the need for in silico methods, especially machine learning-based methods. In recent years, the accuracy of machine learning-based DNA-binding protein prediction has been increasing significantly. However, there are still some critical problems to be solved like how to convert protein sequences into an appropriate discrete model or vector. In this paper, we propose a novel feature construction method based on a position-specific scoring matrix (PSSM) named K-PSSM-Composition. The proposed features can efficiently capture the information about 20 amino acid residues and the local information of a given sequence during the evolutionary process. We perform a recursive feature elimination to extract the optimal set of features, which are used to train the support vector machine model for predicting DNA-binding proteins. We evaluate and compare our proposed predictor with other advanced predictors via two standard benchmark data sets. The proposed method achieves the accuracy values of 89.77% and 88.71% for the jackknife test and independent test respectively, outperforming the compared methods. This finding demonstrates the efficacy and effectiveness of the proposed method in predicting the DNA-binding proteins. The source code and data are available at
【 授权许可】
Unknown