Journal of Translational Medicine | |
Predicting phosphorylation sites using machine learning by integrating the sequence, structure, and functional information of proteins | |
Salma Jamal1  Waseem Ali1  Sonam Grover1  Priya Nagpal2  Abhinav Grover2  | |
[1] JH-Institute of Molecular Medicine, Jamia Hamdard, New Delhi, India;School of Biotechnology, Jawaharlal Nehru University, New Delhi, India; | |
关键词: Post-translational modification; MRMR; Symmetrical uncertainty; Random forest; Support vector machine; | |
DOI : 10.1186/s12967-021-02851-0 | |
来源: Springer | |
【 摘 要 】
BackgroundPost-translational modification (PTM) is a biological process that alters proteins and is therefore involved in the regulation of various cellular activities and pathogenesis. Protein phosphorylation is an essential process and one of the most-studied PTMs: it occurs when a phosphate group is added to serine (Ser, S), threonine (Thr, T), or tyrosine (Tyr, Y) residue. Dysregulation of protein phosphorylation can lead to various diseases—most commonly neurological disorders, Alzheimer’s disease, and Parkinson’s disease—thus necessitating the prediction of S/T/Y residues that can be phosphorylated in an uncharacterized amino acid sequence. Despite a surplus of sequencing data, current experimental methods of PTM prediction are time-consuming, costly, and error-prone, so a number of computational methods have been proposed to replace them. However, phosphorylation prediction remains limited, owing to substrate specificity, performance, and the diversity of its features.MethodsIn the present study we propose machine-learning-based predictors that use the physicochemical, sequence, structural, and functional information of proteins to classify S/T/Y phosphorylation sites. Rigorous feature selection, the minimum redundancy/maximum relevance approach, and the symmetrical uncertainty method were employed to extract the most informative features to train the models.ResultsThe RF and SVM models generated using diverse feature types in the present study were highly accurate as is evident from good values for different statistical measures. Moreover, independent test sets and benchmark validations indicated that the proposed method clearly outperformed the existing methods, demonstrating its ability to accurately predict protein phosphorylation.ConclusionsThe results obtained in the present work indicate that the proposed computational methodology can be effectively used for predicting putative phosphorylation sites further facilitating discovery of various biological processes mechanisms.
【 授权许可】
CC BY
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
RO202107073085517ZK.pdf | 2376KB | download |