BMC Bioinformatics | |
A two-layered machine learning method to identify protein O-GlcNAcylation sites with O-GlcNAc transferase substrate motifs | |
Research | |
Hui-Ju Kao1  Kai-Yao Huang1  Cheng-Tsung Lu1  Tzong-Yi Lee2  Chien-Hsun Huang3  Shun-Long Weng4  Neil Arvin Bretaña5  | |
[1] Department of Computer Science and Engineering, Yuan Ze University, 320, Taoyuan, Taiwan;Department of Computer Science and Engineering, Yuan Ze University, 320, Taoyuan, Taiwan;Innovation Center for Big Data and Digital Convergence, Yuan Ze University, 320, Taoyuan, Taiwan;Department of Computer Science and Engineering, Yuan Ze University, 320, Taoyuan, Taiwan;Tao-Yuan Hospital, Ministry of Health & Welfare, 320, Taoyuan, Taiwan;Department of Obstetrics and Gynecology, Hsinchu Mackay Memorial Hospital, 300, Hsin-Chu, Taiwan;Mackay Junior College of Medicine, Nursing and Management, 112, Taipei, Taiwan;Department of Medicine, Mackay Medical College, 252, New Taipei City, Taiwan;Inflammation and Infection Research Centre, School of Medical Sciences, University of New South Wales, Sydney, Australia; | |
关键词: O-GlcNAcylation; O-linked glycosylation; O-GlcNAc transferase (OGT); substrate motif; profile hidden Markov model; support vector machine; | |
DOI : 10.1186/1471-2105-16-S18-S10 | |
来源: Springer | |
【 摘 要 】
Protein O-GlcNAcylation, involving the β-attachment of single N-acetylglucosamine (GlcNAc) to the hydroxyl group of serine or threonine residues, is an O-linked glycosylation catalyzed by O-GlcNAc transferase (OGT). Molecular level investigation of the basis for OGT's substrate specificity should aid understanding how O-GlcNAc contributes to diverse cellular processes. Due to an increasing number of O-GlcNAcylated peptides with site-specific information identified by mass spectrometry (MS)-based proteomics, we were motivated to characterize substrate site motifs of O-GlcNAc transferases. In this investigation, a non-redundant dataset of 410 experimentally verified O-GlcNAcylation sites were manually extracted from dbOGAP, OGlycBase and UniProtKB. After detection of conserved motifs by using maximal dependence decomposition, profile hidden Markov model (profile HMM) was adopted to learn a first-layered model for each identified OGT substrate motif. Support Vector Machine (SVM) was then used to generate a second-layered model learned from the output values of profile HMMs in first layer. The two-layered predictive model was evaluated using a five-fold cross validation which yielded a sensitivity of 85.4%, a specificity of 84.1%, and an accuracy of 84.7%. Additionally, an independent testing set from PhosphoSitePlus, which was really non-homologous to the training data of predictive model, was used to demonstrate that the proposed method could provide a promising accuracy (84.05%) and outperform other O-GlcNAcylation site prediction tools. A case study indicated that the proposed method could be a feasible means of conducting preliminary analyses of protein O-GlcNAcylation and has been implemented as a web-based system, OGTSite, which is now freely available at http://csb.cse.yzu.edu.tw/OGTSite/.
【 授权许可】
Unknown
© Kao et al.; 2015. This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
RO202311097520219ZK.pdf | 1667KB | download |
【 参考文献 】
- [1]
- [2]
- [3]
- [4]
- [5]
- [6]
- [7]
- [8]
- [9]
- [10]
- [11]
- [12]
- [13]
- [14]
- [15]
- [16]
- [17]
- [18]
- [19]
- [20]
- [21]
- [22]
- [23]
- [24]
- [25]
- [26]
- [27]
- [28]
- [29]
- [30]
- [31]
- [32]
- [33]
- [34]
- [35]
- [36]
- [37]
- [38]
- [39]
- [40]
- [41]
- [42]
- [43]
- [44]
- [45]
- [46]
- [47]
- [48]
- [49]
- [50]
- [51]
- [52]
- [53]
- [54]