期刊论文详细信息
BMC Bioinformatics
Computational methods for ubiquitination site prediction using physicochemical properties of protein sequences
Research Article
Binghuang Cai1  Xia Jiang1 
[1]Department of Biomedical Informatics, School of Medicine, University of Pittsburgh, 15206-3701, Pittsburgh, PA, USA
关键词: Ubiquitination;    Ubiquitination Site Prediction;    Protein sequence;    Physicochemical property (PCP);    Amino Acid (AA);    Machine learning;    Bayesian Network (BN);    Support Vector Machine (SVM);    Logistic Regression (LR);    Least Absolute Shrinkage and Selection Operator (LASSO);    Prediction;   
DOI  :  10.1186/s12859-016-0959-z
 received in 2015-05-20, accepted in 2016-02-19,  发布年份 2016
来源: Springer
PDF
【 摘 要 】
BackgroundUbiquitination is a very important process in protein post-translational modification, which has been widely investigated by biology scientists and researchers. Different experimental and computational methods have been developed to identify the ubiquitination sites in protein sequences. This paper aims at exploring computational machine learning methods for the prediction of ubiquitination sites using the physicochemical properties (PCPs) of amino acids in the protein sequences.ResultsWe first establish six different ubiquitination data sets, whose records contain both ubiquitination sites and non-ubiquitination sites in variant numbers of protein sequence segments. In particular, to establish such data sets, protein sequence segments are extracted from the original protein sequences used in four published papers on ubiquitination, while 531 PCP features of each extracted protein sequence segment are calculated based on PCP values from AAindex (Amino Acid index database) by averaging PCP values of all amino acids on each segment. Various computational machine-learning methods, including four Bayesian network methods (i.e., Naïve Bayes (NB), Feature Selection NB (FSNB), Model Averaged NB (MANB), and Efficient Bayesian Multivariate Classifier (EBMC)) and three regression methods (i.e., Support Vector Machine (SVM), Logistic Regression (LR), and Least Absolute Shrinkage and Selection Operator (LASSO)), are then applied to the six established segment-PCP data sets. Five-fold cross-validation and the Area Under Receiver Operating Characteristic Curve (AUROC) are employed to evaluate the ubiquitination prediction performance of each method. Results demonstrate that the PCP data of protein sequences contain information that could be mined by machine learning methods for ubiquitination site prediction. The comparative results show that EBMC, SVM and LR perform better than other methods, and EBMC is the only method that can get AUCs greater than or equal to 0.6 for the six established data sets. Results also show EBMC tends to perform better for larger data.ConclusionsMachine learning methods have been employed for the ubiquitination site prediction based on physicochemical properties of amino acids on protein sequences. Results demonstrate the effectiveness of using machine learning methodology to mine information from PCP data concerning protein sequences, as well as the superiority of EBMC, SVM and LR (especially EBMC) for the ubiquitination prediction compared to other methods.
【 授权许可】

CC BY   
© Cai and Jiang. 2016

【 预 览 】
附件列表
Files Size Format View
RO202311092087326ZK.pdf 1183KB PDF download
12864_2017_3787_Article_IEq1.gif 1KB Image download
【 图 表 】

12864_2017_3787_Article_IEq1.gif

【 参考文献 】
  • [1]
  • [2]
  • [3]
  • [4]
  • [5]
  • [6]
  • [7]
  • [8]
  • [9]
  • [10]
  • [11]
  • [12]
  • [13]
  • [14]
  • [15]
  • [16]
  • [17]
  • [18]
  • [19]
  • [20]
  • [21]
  • [22]
  • [23]
  • [24]
  • [25]
  • [26]
  • [27]
  • [28]
  • [29]
  • [30]
  • [31]
  • [32]
  • [33]
  • [34]
  • [35]
  • [36]
  • [37]
  • [38]
  • [39]
  • [40]
  • [41]
  • [42]
  • [43]
  • [44]
  • [45]
  • [46]
  文献评价指标  
  下载次数:0次 浏览次数:0次