期刊论文详细信息
BMC Research Notes
ngLOC: software and web server for predicting protein subcellular localization in prokaryotes and eukaryotes
Chittibabu Guda2  Alex Barteau1  Sanjit Pandey2  Suleyman Vural3  Brian R King1 
[1]Department of Computer Science, Bucknell University, One Dent Drive, Lewisburg, PA, 17837, USA
[2]Center for Bioinformatics and Systems Biology, University of Nebraska Medical Center, Omaha, NE 68198, USA
[3]Department of Genetics, Cell Biology and Anatomy, University of Nebraska Medical Center, Omaha, NE 68198, USA
关键词: Machine learning algorithm;    Protein sequence classification;    N-gram-based approach;    Protein subcellular localization prediction;    ngLOC;    Bayesian method;   
Others  :  1166155
DOI  :  10.1186/1756-0500-5-351
 received in 2012-03-02, accepted in 2012-06-22,  发布年份 2012
PDF
【 摘 要 】

Background

Understanding protein subcellular localization is a necessary component toward understanding the overall function of a protein. Numerous computational methods have been published over the past decade, with varying degrees of success. Despite the large number of published methods in this area, only a small fraction of them are available for researchers to use in their own studies. Of those that are available, many are limited by predicting only a small number of organelles in the cell. Additionally, the majority of methods predict only a single location for a sequence, even though it is known that a large fraction of the proteins in eukaryotic species shuttle between locations to carry out their function.

Findings

We present a software package and a web server for predicting the subcellular localization of protein sequences based on the ngLOC method. ngLOC is an n-gram-based Bayesian classifier that predicts subcellular localization of proteins both in prokaryotes and eukaryotes. The overall prediction accuracy varies from 89.8% to 91.4% across species. This program can predict 11 distinct locations each in plant and animal species. ngLOC also predicts 4 and 5 distinct locations on gram-positive and gram-negative bacterial datasets, respectively.

Conclusions

ngLOC is a generic method that can be trained by data from a variety of species or classes for predicting protein subcellular localization. The standalone software is freely available for academic use under GNU GPL, and the ngLOC web server is also accessible at http://ngloc.unmc.edu webcite.

【 授权许可】

   
2012 King et al.; licensee BioMed Central Ltd.

【 预 览 】
附件列表
Files Size Format View
20150416041433767.pdf 241KB PDF download
Figure 1. 79KB Image download
【 图 表 】

Figure 1.

【 参考文献 】
  • [1]Imai K, Nakai K: Prediction of subcellular locations of proteins: where to proceed? Proteomics 2010, 10:3970-3983.
  • [2]Nair R, Rost B: Protein subcellular localization prediction using artificial intelligence technology. Methods in molecular biology (Clifton, N.J.) 2008, 484:435-63.
  • [3]King BR, Guda C: ngLOC: an n-gram-based Bayesian method for estimating the subcellular proteomes of eukaryotes. Genome Biol 2007, 8:R68. BioMed Central Full Text
  • [4]King BR, Latham L, Guda C: Estimation of Subcellular Proteomes in Bacterial Species. The Open Applied Informatics Journal 2009, 3:1-11.
  • [5]Horton P, Park K-J, Obayashi T, Fujita N, Harada H, Adams-Collier CJ, Nakai K: WoLF PSORT: protein localization predictor. Nucleic Acids Res 2007, 35:W585-7.
  • [6]Briesemeister S, Blum T, Brady S, Lam Y, Kohlbacher O, Shatkay H: SherLoc2: a high-accuracy hybrid method for predicting subcellular localization of proteins. Journal of proteome research 2009, 8:5363-6.
  • [7]Chi S-M, Nam D: WegoLoc: accurate prediction of protein subcellular localization using weighted Gene Ontology terms. Bioinformatics 2012, 28:1028-1030.
  • [8]Hoglund A, Donnes P, Blum T, Adolph HW, Kohlbacher O: MultiLoc: prediction of protein subcellular localization using N-terminal targeting sequences, sequence motifs and amino acid composition. Bioinformatics 2006, 22:1158-65.
  • [9]Guda C: Towards Cataloguing the Subcellular Proteomes of Eukaryotic Organisms. In Sequence and Genome Analysis: Methods and Applications. Edited by Zhao Z. iConcept Press Ltd, ; 2010:259-269. http://www.iconceptpress.com/web/site/aboutUs.contactUs.php webcite
  • [10]Osmanbeyoglu HU, Ganapathiraju MK: N-gram analysis of 970 microbial organisms reveals presence of biological language models. BMC Bioinforma 2011, 12:12. BioMed Central Full Text
  文献评价指标  
  下载次数:24次 浏览次数:29次