The Journal of General and Applied Microbiology | |
Using a new GPI-anchored-protein identification system to mine the protein databases of Aspergillus fumigatus, Aspergillus nidulans, and Aspergillus oryzae | |
Tohru Terada2  Wei Cao1  Kentaro Shimizu1  Katsuhiko Kitamoto1  Jun-ichi Maruyama1  Kazuya Sumikoshi1  Shugo Nakamura1  | |
[1] Department of Biotechnology, Graduate School of Agricultural and Life Sciences, The University of Tokyo;Agricultural Bioinformatics Research Unit, Graduate School of Agricultural and Life Sciences, The University of Tokyo | |
关键词: Aspergillus fumigatus; Aspergillus nidulans; Aspergillus oryzae; GPI; SVM; | |
DOI : 10.2323/jgam.55.381 | |
学科分类:微生物学和免疫学 | |
来源: Applied Microbiology, Molecular and Cellulrar Biosciences Research Foundation | |
【 摘 要 】
Computational approaches provide valuable information to start experimental surveys identifying glycosylphosphatidylinositol (GPI)-anchored proteins in protein sequence databases. We developed a new sequence-based identification system that uses an optimized classifier based on a support vector machine (SVM) algorithm to recognize appropriate COOH-terminal sequences and uses a classifier implementing a simple majority voting strategy to recognize appropriate NH2-terminal sequences. The SVM classifier showed high accuracy (96%) in 5-fold cross-validation testing, and the majority voting classifier showed high recall (98.88%) when applied to a test dataset of eukaryote proteins. When applied to S. cerevisiae protein sequences, the new identification system showed good ability to classify “unseen” data. Applying our system to protein sequences of three aspergilli, we identified 115 GPI-anchored proteins in Aspergillus fumigatus, 129 in Aspergillus nidulans, and 136 in Aspergillus oryzae. Sequence-based conserved domain search found nearly half of these proteins to have conserved domains that covered a wide range of functions.
【 授权许可】
Unknown
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
RO201912010139048ZK.pdf | 2KB | download |