期刊论文详细信息
BMC Research Notes
HMM-ModE: implementation, benchmarking and validation with HMMER3
Andrew Michael Lynn1  Swati Sinha1 
[1] School of Computational and Integrative Sciences, Jawaharlal Nehru University, New Delhi 110067, India
关键词: GPCRs;    Annotation;    Emission probability;    HMM profile;   
Others  :  1232484
DOI  :  10.1186/1756-0500-7-483
 received in 2013-12-12, accepted in 2014-07-21,  发布年份 2014
PDF
【 摘 要 】

Background

HMM-ModE is a computational method that generates family specific profile HMMs using negative training sequences. The method optimizes the discrimination threshold using 10 fold cross validation and modifies the emission probabilities of profiles to reduce common fold based signals shared with other sub-families. The protocol depends on the program HMMER for HMM profile building and sequence database searching. The recent release of HMMER3 has improved database search speed by several orders of magnitude, allowing for the large scale deployment of the method in sequence annotation projects. We have rewritten our existing scripts both at the level of parsing the HMM profiles and modifying emission probabilities to upgrade HMM-ModE using HMMER3 that takes advantage of its probabilistic inference with high computational speed. The method is benchmarked and tested on GPCR dataset as an accurate and fast method for functional annotation.

Results

The implementation of this method, which now works with HMMER3, is benchmarked with the earlier version of HMMER, to show that the effect of local-local alignments is marked only in the case of profiles containing a large number of discontinuous match states. The method is tested on a gold standard set of families and we have reported a significant reduction in the number of false positive hits over the default HMM profiles. When implemented on GPCR sequences, the results showed an improvement in the accuracy of classification compared with other methods used to classify the familyat different levels of their classification hierarchy.

Conclusions

The present findings show that the new version of HMM-ModE is a highly specific method used to differentiate between fold (superfamily) and function (family) specific signals, which helps in the functional annotation of protein sequences. The use of modified profile HMMs of GPCR sequences provides a simple yet highly specific method for classification of the family, being able to predict the sub-family specific sequences with high accuracy even though sequences share common physicochemical characteristics between sub-families.

【 授权许可】

   
2014 Sinha and Lynn; licensee BioMed Central Ltd.

【 预 览 】
附件列表
Files Size Format View
20151114083150198.pdf 912KB PDF download
Figure 3. 61KB Image download
Figure 2. 92KB Image download
Figure 1. 94KB Image download
【 图 表 】

Figure 1.

Figure 2.

Figure 3.

【 参考文献 】
  • [1]Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool. J Mol Biol 1990, 215:403-410.
  • [2]Pearson WR, Lipman DJ: Improved tools for biological sequence comparison. Proc Natl Acad Sci U S A 1988, 85:2444-2448.
  • [3]Eddy SR: Profile hidden Markov models. Bioinformatics 1998, 14:755-763.
  • [4]Karplus K, Barrett C, Hughey R: Hidden Markov models for detecting remote protein homologies. Bioinformatics 1998, 14:846-856.
  • [5]Punta M, Coggill PC, Eberhardt RY, Mistry J, Tate J, Boursnell C, Pang N, Forslund K, Ceric G, Clements J, Heger A, Holm L, Sonnhammer ELL, Eddy SR, Bateman A, Finn RD: The Pfam protein families database. Nucleic Acids Res 2012, 40:D290-D301.
  • [6]Mamitsuka H: A learning method of hidden Markov models for sequence discrimination. J Comput Biol 1996, 3:361-373.
  • [7]Wistrand M, Sonnhammer ELL: Improving profile HMM discrimination by adapting transition probabilities. J Mol Biol 2004, 338:847-854.
  • [8]Hannenhalli SS, Russell RB: Analysis and prediction of functional sub-types from protein sequence alignments. J Mol Biol 2000, 303:61-76.
  • [9]Srivastava P, Desai D, Nandi S, Lynn A: HMM-ModE–Improved classification using profile hidden Markov models by optimising the discrimination threshold and modifying emission probabilities with negative training sequences. BMC Bioinformatics 2007, 8:104. BioMed Central Full Text
  • [10]Eddy SR: Accelerated profile HMM searches. PLoS Comput Biol 2011, 7:e1002195.
  • [11]Brown SD, Gerlt JA, Seffernick JL, Babbitt PC: A gold standard set of mechanistically diverse enzyme superfamilies. Genome Biol 2006, 7:R8. BioMed Central Full Text
  • [12]Brown DP, Krishnamurthy N, Sjolander K: Automated protein sub family identification and classification. PLoS Comput Biol 2007, 3(8):e160.
  • [13]Venkatakrishnan AJ, Deupi X, Lebon G, Tate CG, Schertler GF, Babu MM: Molecular signatures of G-protein-coupled receptors. Nature 2013, 494:185-194.
  • [14]Vroling B, Sanders M, Baakman C, Borrmann A, Verhoeven S, Klomp J, Oliveira L, Vlieg JD, Vriend G: GPCRDB: information system for G protein-coupled receptors. Nucleic Acids Res 2010, 39:D309-D319.
  • [15]Elrod WD, Chou KC: A study on the correlation of G-protein-coupled receptor types with amino acid composition. Protein Eng 2002, 15:713-715.
  • [16]Huang Y, Cai J, Ji L, Li Y: Classifying G-protein coupled receptors with bagging classification tree. Comput Biol Chem 2004, 28:275-280.
  • [17]Karchin R, Karplus K, Haussler D: Classifying G-protein coupled receptors with support vector machines. Bioinformatics 2002, 18:147-159.
  • [18]Bhasin M, Raghava GPS: GPCRpred: an SVM-based method for prediction of families and subfamilies of G- protein coupled receptors. Nucleic Acids Res 2004, 32:W383-W389.
  • [19]Inoue Y, Ikeda M, Shimizu T: Proteome-wide classification and identification of mammalian-type GPCRs by binary topology pattern. Comput Biol Chem 2004, 28:39-49.
  • [20]Papasaikas PK, Bagos PG, Litou ZI, Promponas VJ, Hamodrakas SJ: PRED-GPCR: GPCR recognition and family classification server. Nucleic Acids Res 2004, 32:W380-W382.
  • [21]Qian B, Soyer OS, Neubig RR, Goldstein RA: Depicting a protein’s two faces: GPCR classification by phylogenetic tree-based HMMs. FEBS Lett 2003, 554:95-99.
  • [22]Gao QB, Wang ZZ: Classification of G-protein coupled receptors at four levels. Protein Eng Design Select 2006, 19:511-516.
  • [23]Peng ZL, Yang JY, Chen X: An improved classification of G-protein-coupled receptors using sequence-derived features. BMC Bioinformatics 2010, 11:420. BioMed Central Full Text
  • [24]Desai DK, Nandi S, Srivastava PK, Lynn AM: ModEnzA: accurate identification of metabolic enzymes using function specific profile HMMs with optimised discrimination threshold and modified emission probabilities. Adv Bioinformatics 2011, 2011:743782.
  • [25]Gao QB, Wu CC, Ma XQ, Lu J, He J: Classification of amine type G-protein coupled receptors with feature selection. Protein Pept Lett 2008, 15:834-842.
  • [26]Davies MN, Gloriam DE, Secker A, Freitas AA, Mendao M, Timmis J, Flower DR: Proteomic applications of automated GPCR classification. Proteomics 2007, 7(16):2800-2814.
  • [27]Edgar RC: MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 2004, 32:1792-1797.
  • [28]Eddy SR: A new generation of homology search tools based on probabilistic inference. Genome Inform 2004, 23:205-211.
  • [29]Wistrand M, Kall L, Sonnhammer ELL: A general model of G protein-coupled receptor sequences and its application to detect remote homologs. Protein Sci 2006, 15:509-521.
  • [30]Pirovano W, Feenstra KA, Heringa J: PRALINE-TM: a strategy for improved multiple alignment of transmembrane proteins. Bioinformatics 2008, 24:492-497.
  • [31]Katoh K, Misawa K, Kuma K, Miyata T: MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res 2002, 30:3059-3066.
  文献评价指标  
  下载次数:18次 浏览次数:14次