期刊论文

【摘要】

BackgroundAs an important type of post-translational modification (PTM), protein glycosylation plays a crucial role in protein stability and protein function. The abundance and ubiquity of protein glycosylation across three domains of life involving Eukarya, Bacteria and Archaea demonstrate its roles in regulating a variety of signalling and metabolic pathways. Mutations on and in the proximity of glycosylation sites are highly associated with human diseases. Accordingly, accurate prediction of glycosylation can complement laboratory-based methods and greatly benefit experimental efforts for characterization and understanding of functional roles of glycosylation. For this purpose, a number of supervised-learning approaches have been proposed to identify glycosylation sites, demonstrating a promising predictive performance. To train a conventional supervised-learning model, both reliable positive and negative samples are required. However, in practice, a large portion of negative samples (i.e. non-glycosylation sites) are mislabelled due to the limitation of current experimental technologies. Moreover, supervised algorithms often fail to take advantage of large volumes of unlabelled data, which can aid in model learning in conjunction with positive samples (i.e. experimentally verified glycosylation sites).ResultsIn this study, we propose a positive unlabelled (PU) learning-based method, PA2DE (V2.0), based on the AlphaMax algorithm for protein glycosylation site prediction. The predictive performance of this proposed method was evaluated by a range of glycosylation data collected over a ten-year period based on an interval of three years. Experiments using both benchmarking and independent tests show that our method outperformed the representative supervised-learning algorithms (including support vector machines and random forests) and one-class learners, as well as currently available prediction methods in terms of F1 score, accuracy and AUC measures. In addition, we developed an online web server as an implementation of the optimized model (available at http://glycomine.erc.monash.edu/Lab/GlycoMine_PU/) to facilitate community-wide efforts for accurate prediction of protein glycosylation sites.ConclusionThe proposed PU learning approach achieved a competitive predictive performance compared with currently available methods. This PU learning schema may also be effectively employed and applied to address the prediction problems of other important types of protein PTM site and functional sites.

【授权许可】

CC BY

【预览】

附件列表
Files	Size	Format	View
RO201909249522230ZK.pdf	2352KB	PDF	download

BMC Bioinformatics
Positive-unlabelled learning of glycosylation sites in the human proteome

¹ ² ³ ⁴ ⁴ ⁵ ⁶ ⁷
[1] 0000 0004 1760 4150, grid.144022.1, College of Information Engineering, Northwest A and F University, 712100, Yangling, Shaanxi, China;0000 0004 1936 7857, grid.1002.3, Infection and Immunity Program, Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, 3800, Melbourne, VIC, Australia;0000 0004 1936 7857, grid.1002.3, Infection and Immunity Program, Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, 3800, Melbourne, VIC, Australia;0000 0001 2156 2780, grid.5801.c, Department of Biology, Institute of Molecular Systems Biology, ETH Zürich, 8093, Zürich, Switzerland;0000 0004 1936 7857, grid.1002.3, Infection and Immunity Program, Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, 3800, Melbourne, VIC, Australia;0000 0004 1936 7857, grid.1002.3, Monash Centre for Data Science, Faculty of Information Technology, Monash University, 3800, Melbourne, VIC, Australia;0000 0004 1936 7857, grid.1002.3, Infection and Immunity Program, Biomedicine Discovery Institute and Department of Microbiology, Monash University, 3800, Melbourne, VIC, Australia;0000 0004 1936 7857, grid.1002.3, Monash Centre for Data Science, Faculty of Information Technology, Monash University, 3800, Melbourne, VIC, Australia;Gordon Life Science Institute, 02478, Boston, MA, USA;0000 0004 0369 4060, grid.54549.39, Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China, 610054, Chengdu, China;
关键词: Protein glycosylation prediction; Positive unlabelled-learning; Supervised-learning; AlphaMax; Sequence analysis; Sequence-derived features;
DOI : 10.1186/s12859-019-2700-1
来源: publisher
PDF


	文献评价指标
	下载次数：3次	浏览次数：1次

【 摘 要 】

【 授权许可】

【 预 览 】

【摘要】

【授权许可】

【预览】