期刊论文详细信息
Journal of Translational Medicine
Predicting phosphorylation sites using machine learning by integrating the sequence, structure, and functional information of proteins
Salma Jamal1  Waseem Ali1  Sonam Grover1  Priya Nagpal2  Abhinav Grover2 
[1] JH-Institute of Molecular Medicine, Jamia Hamdard, New Delhi, India;School of Biotechnology, Jawaharlal Nehru University, New Delhi, India;
关键词: Post-translational modification;    MRMR;    Symmetrical uncertainty;    Random forest;    Support vector machine;   
DOI  :  10.1186/s12967-021-02851-0
来源: Springer
PDF
【 摘 要 】

BackgroundPost-translational modification (PTM) is a biological process that alters proteins and is therefore involved in the regulation of various cellular activities and pathogenesis. Protein phosphorylation is an essential process and one of the most-studied PTMs: it occurs when a phosphate group is added to serine (Ser, S), threonine (Thr, T), or tyrosine (Tyr, Y) residue. Dysregulation of protein phosphorylation can lead to various diseases—most commonly neurological disorders, Alzheimer’s disease, and Parkinson’s disease—thus necessitating the prediction of S/T/Y residues that can be phosphorylated in an uncharacterized amino acid sequence. Despite a surplus of sequencing data, current experimental methods of PTM prediction are time-consuming, costly, and error-prone, so a number of computational methods have been proposed to replace them. However, phosphorylation prediction remains limited, owing to substrate specificity, performance, and the diversity of its features.MethodsIn the present study we propose machine-learning-based predictors that use the physicochemical, sequence, structural, and functional information of proteins to classify S/T/Y phosphorylation sites. Rigorous feature selection, the minimum redundancy/maximum relevance approach, and the symmetrical uncertainty method were employed to extract the most informative features to train the models.ResultsThe RF and SVM models generated using diverse feature types in the present study were highly accurate as is evident from good values for different statistical measures. Moreover, independent test sets and benchmark validations indicated that the proposed method clearly outperformed the existing methods, demonstrating its ability to accurately predict protein phosphorylation.ConclusionsThe results obtained in the present work indicate that the proposed computational methodology can be effectively used for predicting putative phosphorylation sites further facilitating discovery of various biological processes mechanisms.

【 授权许可】

CC BY   

【 预 览 】
附件列表
Files Size Format View
RO202107073085517ZK.pdf 2376KB PDF download
  文献评价指标  
  下载次数:3次 浏览次数:2次