期刊论文详细信息
BMC Bioinformatics
Sites Inferred by Metabolic Background Assertion Labeling (SIMBAL): adapting the Partial Phylogenetic Profiling algorithm to scan sequences for signatures that predict protein function
Daniel H Haft1  Douglas B Rusch1  Jeremy D Selengut1 
[1]J. Craig Venter Institute, 9704 Medical Center Drive, Rockville, MD, USA, 20850
Others  :  1166666
DOI  :  10.1186/1471-2105-11-52
 received in 2009-08-04, accepted in 2010-01-26,  发布年份 2010
PDF
【 摘 要 】

Background

Comparative genomics methods such as phylogenetic profiling can mine powerful inferences from inherently noisy biological data sets. We introduce Sites Inferred by Metabolic Background Assertion Labeling (SIMBAL), a method that applies the Partial Phylogenetic Profiling (PPP) approach locally within a protein sequence to discover short sequence signatures associated with functional sites. The approach is based on the basic scoring mechanism employed by PPP, namely the use of binomial distribution statistics to optimize sequence similarity cutoffs during searches of partitioned training sets.

Results

Here we illustrate and validate the ability of the SIMBAL method to find functionally relevant short sequence signatures by application to two well-characterized protein families. In the first example, we partitioned a family of ABC permeases using a metabolic background property (urea utilization). Thus, the TRUE set for this family comprised members whose genome of origin encoded a urea utilization system. By moving a sliding window across the sequence of a permease, and searching each subsequence in turn against the full set of partitioned proteins, the method found which local sequence signatures best correlated with the urea utilization trait. Mapping of SIMBAL "hot spots" onto crystal structures of homologous permeases reveals that the significant sites are gating determinants on the cytosolic face rather than, say, docking sites for the substrate-binding protein on the extracellular face. In the second example, we partitioned a protein methyltransferase family using gene proximity as a criterion. In this case, the TRUE set comprised those methyltransferases encoded near the gene for the substrate RF-1. SIMBAL identifies sequence regions that map onto the substrate-binding interface while ignoring regions involved in the methyltransferase reaction mechanism in general. Neither method for training set construction requires any prior experimental characterization.

Conclusions

SIMBAL shows that, in functionally divergent protein families, selected short sequences often significantly outperform their full-length parent sequence for making functional predictions by sequence similarity, suggesting avenues for improved functional classifiers. When combined with structural data, SIMBAL affords the ability to localize and model functional sites.

【 授权许可】

   
2010 Selengut et al; licensee BioMed Central Ltd.

【 预 览 】
附件列表
Files Size Format View
20150416052200525.pdf 4798KB PDF download
Figure 7. 118KB Image download
Figure 6. 93KB Image download
Figure 5. 180KB Image download
Figure 4. 210KB Image download
Figure 3. 105KB Image download
Figure 2. 51KB Image download
Figure 1. 153KB Image download
【 图 表 】

Figure 1.

Figure 2.

Figure 3.

Figure 4.

Figure 5.

Figure 6.

Figure 7.

【 参考文献 】
  • [1]Pellegrini M, Marcotte EM, Thompson MJ, Eisenberg D, Yeates TO: Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. Proc Natl Acad Sci USA 1999, 96(8):4285-4288.
  • [2]Kensche PR, van Noort V, Dutilh BE, Huynen MA: Practical and theoretical advances in predicting the function of a protein by its phylogenetic distribution. J R Soc Interface 2008, 5(19):151-170.
  • [3]Haft DH, Selengut JD, Brinkac LM, Zafar N, White O: Genome Properties: a system for the investigation of prokaryotic genetic content for microbiology, genome annotation and comparative genomics. Bioinformatics 2005, 21(3):293-306.
  • [4]Selengut JD, Haft DH, Davidsen T, Ganapathy A, Gwinn-Giglio M, Nelson WC, Richter AR, White O: TIGRFAMs and Genome Properties: tools for the assignment of molecular function and biological process in prokaryotic genomes. Nucleic Acids Res 2007, (35 Database):D260-264.
  • [5]Overbeek R, Begley T, Butler RM, Choudhuri JV, Chuang HY, Cohoon M, de Crecy-Lagard V, Diaz N, Disz T, Edwards R, et al.: The subsystems approach to genome annotation and its use in the project to annotate 1000 genomes. Nucleic Acids Res 2005, 33(17):5691-5702.
  • [6]Caspi R, Karp PD: Using the MetaCyc pathway database and the BioCyc database collection. Curr Protoc Bioinformatics 2007., Chapter 1(Unit1 17)
  • [7]Haft DH, Paulsen IT, Ward N, Selengut JD: Exopolysaccharide-associated protein sorting in environmental organisms: the PEP-CTERM/EpsH system. Application of a novel phylogenetic profiling heuristic. BMC Biol 2006, 4:29. BioMed Central Full Text
  • [8]Valladares A, Montesinos ML, Herrero A, Flores E: An ABC-type, high-affinity urea permease identified in cyanobacteria. Mol Microbiol 2002, 43(3):703-715.
  • [9]Beckers G, Bendt AK, Kramer R, Burkovski A: Molecular identification of the urea uptake system and transcriptional analysis of urea transporter- and urease-encoding genes in Corynebacterium glutamicum. J Bacteriol 2004, 186(22):7645-7652.
  • [10]Finn RD, Tate J, Mistry J, Coggill PC, Sammut SJ, Hotz HR, Ceric G, Forslund K, Eddy SR, Sonnhammer EL, et al.: The Pfam protein families database. Nucleic Acids Res 2008, 36(36 Database):D281-288.
  • [11]Peterson JD, Umayam LA, Dickinson T, Hickey EK, White O: The Comprehensive Microbial Resource. Nucleic Acids Res 2001, 29(1):123-125.
  • [12]Locher KP, Lee AT, Rees DC: The E. coli BtuCD structure: a framework for ABC transporter architecture and mechanism. Science 2002, 296(5570):1091-1098.
  • [13]Pinkett HW, Lee AT, Lum P, Locher KP, Rees DC: An inward-facing conformation of a putative metal-chelate-type ABC transporter. Science 2007, 315(5810):373-377.
  • [14]Jones DT: Improving the accuracy of transmembrane protein topology prediction using evolutionary information. Bioinformatics 2007, 23(5):538-544.
  • [15]Oldham ML, Davidson AL, Chen J: Structural insights into ABC transporter mechanism. Curr Opin Struct Biol 2008, 18(6):726-733.
  • [16]Kanamori T, Kanou N, Atomi H, Imanaka T: Enzymatic characterization of a prokaryotic urea carboxylase. J Bacteriol 2004, 186(9):2532-2539.
  • [17]Graille M, Heurgue-Hamard V, Champ S, Mora L, Scrima N, Ulryck N, van Tilbeurgh H, Buckingham RH: Molecular basis for bacterial class I release factor methylation by PrmC. Mol Cell 2005, 20(6):917-927.
  • [18]Nakahigashi K, Kubo N, Narita S, Shimaoka T, Goto S, Oshima T, Mori H, Maeda M, Wada C, Inokuchi H: HemK, a class of protein methyl transferase with similarity to DNA methyl transferases, methylates polypeptide chain release factors, and hemK knockout induces defects in translational termination. Proc Natl Acad Sci USA 2002, 99(3):1473-1478.
  • [19]Heurgue-Hamard V, Champ S, Engstrom A, Ehrenberg M, Buckingham RH: The hemK gene in Escherichia coli encodes the N(5)-glutamine methyltransferase that modifies peptide release factors. Embo J 2002, 21(4):769-778.
  • [20]Kagan RM, Clarke S: Widespread occurrence of three sequence motifs in diverse S-adenosylmethionine-dependent methyltransferases suggests a common structure for these enzymes. Arch Biochem Biophys 1994, 310(2):417-427.
  • [21]Sankararaman S, Sjolander K: INTREPID--INformation-theoretic TREe traversal for Protein functional site IDentification. Bioinformatics 2008, 24(21):2445-2452.
  • [22]Kanamori T, Kanou N, Kusakabe S, Atomi H, Imanaka T: Allophanate hydrolase of Oleomonas sagaranensis involved in an ATP-dependent degradation pathway specific to urea. FEMS Microbiol Lett 2005, 245(1):61-65.
  • [23]Colson C, Lhoest J, Urlings C: Genetics of ribosomal protein methylation in Escherichia coli. III. Map position of two genes, prmA and prmB, governing methylation of proteins L11 and L3. Mol Gen Genet 1979, 169(3):245-250.
  • [24]Wu M, Eisen JA: A simple, fast, and accurate method of phylogenomic inference. Genome Biol 2008, 9(10):R151. BioMed Central Full Text
  文献评价指标  
  下载次数:52次 浏览次数:7次