期刊论文详细信息
BMC Systems Biology
A novel function prediction approach using protein overlap networks
Chi Zhang1  Huarong Guo2  Daron M Standley3  Dandan Zheng4  Shide Liang3 
[1]School of Biological Sciences, Center for Plant Science and Innovation, University of Nebraska, Lincoln, NE 68588, USA
[2]Department of Marine Biology, Ocean University of China, Qingdao 266003, P. R. China
[3]Systems Immunology Lab, Immunology Frontier Research Center, Osaka University, Suita, Osaka 565-0871, Japan
[4]Department of Radiation Oncology, University of Nebraska Medical Center, Omaha, NE 68198, USA
关键词: Functional genomics;    Composite network;    Protein function prediction;    Protein overlap network;   
Others  :  1142624
DOI  :  10.1186/1752-0509-7-61
 received in 2013-02-08, accepted in 2013-07-12,  发布年份 2013
PDF
【 摘 要 】

Background

Construction of a reliable network remains the bottleneck for network-based protein function prediction. We built an artificial network model called protein overlap network (PON) for the entire genome of yeast, fly, worm, and human, respectively. Each node of the network represents a protein, and two proteins are connected if they share a domain according to InterPro database.

Results

The function of a protein can be predicted by counting the occurrence frequency of GO (gene ontology) terms associated with domains of direct neighbors. The average success rate and coverage were 34.3% and 43.9%, respectively, for the test genomes, and were increased to 37.9% and 51.3% when a composite PON of the four species was used for the prediction. As a comparison, the success rate was 7.0% in the random control procedure. We also made predictions with GO term annotations of the second layer nodes using the composite network and obtained an impressive success rate (>30%) and coverage (>30%), even for small genomes. Further improvement was achieved by statistical analysis of manually annotated GO terms for each neighboring protein.

Conclusions

The PONs are composed of dense modules accompanied by a few long distance connections. Based on the PONs, we developed multiple approaches effective for protein function prediction.

【 授权许可】

   
2013 Liang et al.; licensee BioMed Central Ltd.

【 预 览 】
附件列表
Files Size Format View
20150328100946362.pdf 661KB PDF download
Figure 4. 24KB Image download
Figure 3. 34KB Image download
Figure 2. 89KB Image download
Figure 1. 24KB Image download
【 图 表 】

Figure 1.

Figure 2.

Figure 3.

Figure 4.

【 参考文献 】
  • [1]Eisenberg D, Marcotte EM, Xenarios I, Yeates TO: Protein function in the post-genomic era. Nature 2000, 405:823-826.
  • [2]Dobson PD, Cai YD, Stapley BJ, Doig AJ: Prediction of protein function in the absence of significant sequence similarity. Curr Med Chem 2004, 11:2135-2142.
  • [3]Procter JB, Thompson J, Letunic I, Creevey C, Jossinet F, Barton GJ: Visualization of multiple alignments, phylogenies and gene family evolution. Nat Methods 2010, 7:S16-S25.
  • [4]Watson JD, Laskowski RA, Thornton JM: Predicting protein function from sequence and structural data. Curr Opin Struct Biol 2005, 15:275-284.
  • [5]Pal D, Eisenberg D: Inference of protein function from protein structure. Structure 2005, 13:121-130.
  • [6]Ponomarenko JV, Bourne PE, Shindyalov IN: Assigning new GO annotations to protein data bank sequences by combining structure and sequence homology. Proteins 2005, 58:855-865.
  • [7]Sael L, Chitale M, Kihara D: Structure- and sequence-based function prediction for non-homologous proteins. J Struct Funct Genomics 2012, 13:111-123.
  • [8]Tian W, Skolnick J: How well is enzyme function conserved as a function of pairwise sequence identity? J Mol Biol 2003, 333:863-882.
  • [9]Sharan R, Ulitsky I, Shamir R: Network-based prediction of protein function. Mol Syst Biol 2007, 3:88.
  • [10]Janga SC, Diaz-Mejia JJ, Moreno-Hagelsieb G: Network-based function prediction and interactomics: the case for metabolic enzymes. Metab Eng 2011, 13:1-10.
  • [11]Reimand J, Hui S, Jain S, Law B, Bader GD: Domain-mediated protein interaction prediction: From genome to network. FEBS Lett 2012, 586:2751-2763.
  • [12]Schwikowski B, Uetz P, Fields S: A network of protein-protein interactions in yeast. Nat Biotechnol 2000, 18:1257-1261.
  • [13]Hishigaki H, Nakai K, Ono T, Tanigami A, Takagi T: Assessment of prediction accuracy of protein function from protein–protein interaction data. Yeast 2001, 18:523-531.
  • [14]Chua HN, Sung WK, Wong L: Exploiting indirect neighbours and topological weight to predict protein function from protein-protein interactions. Bioinformatics 2006, 22:1623-1630.
  • [15]Spirin V, Mirny LA: Protein complexes and functional modules in molecular networks. Proc Natl Acad Sci USA 2003, 100:12123-12128.
  • [16]Rives AW, Galitski T: Modular organization of cellular networks. Proc Natl Acad Sci USA 2003, 100:1128-1133.
  • [17]Bader GD, Hogue CW: An automated method for finding molecular complexes in large protein interaction networks. BMC Bioinforma 2003, 4:2. BioMed Central Full Text
  • [18]Altaf-Ul-Amin M, Shinbo Y, Mihara K, Kurokawa K, Kanaya S: Development and implementation of an algorithm for detection of protein complexes in large interaction networks. BMC Bioinforma 2006, 7:207. BioMed Central Full Text
  • [19]Pereira-Leal JB, Enright AJ, Ouzounis CA: Detection of functional modules from protein interaction networks. Proteins 2004, 54:49-57.
  • [20]Fields S, Song O: A novel genetic system to detect protein-protein interactions. Nature 1989, 340:245-246.
  • [21]Larsson PO, Mosbach K: Affinity precipitation of enzymes. FEBS Lett 1979, 98:333-338.
  • [22]Novick P, Osmond BC, Botstein D: Suppressors of yeast actin mutations. Genetics 1989, 121:659-674.
  • [23]Bender A, Pringle JR: Use of a screen for synthetic lethal and multicopy suppressee mutants to identify two new genes involved in morphogenesis in Saccharomyces cerevisiae. Mol Cell Biol 1991, 11:1295-1305.
  • [24]Stuart JM, Segal E, Koller D, Kim SK: A gene-coexpression network for global discovery of conserved genetic modules. Science 2003, 302:249-255.
  • [25]Aoki K, Ogata Y, Shibata D: Approaches for extracting practical information from gene co-expression networks in plant biology. Plant Cell Physiol 2007, 48:381-390.
  • [26]Hakes L, Pinney JW, Robertson DL, Lovell SC: Protein-protein interaction networks and biology–what’s the connection? Nat Biotechnol 2008, 26:69-72.
  • [27]Enright AJ, Iliopoulos I, Kyrpides NC, Ouzounis CA: Protein interaction maps for complete genomes based on gene fusion events. Nature 1999, 402:86-90.
  • [28]Overbeek R, Fonstein M, D’Souza M, Pusch GD, Maltsev N: The use of gene clusters to infer functional coupling. Proc Natl Acad Sci USA 1999, 96:2896-2901.
  • [29]Pellegrini M, Marcotte EM, Thompson MJ, Eisenberg D, Yeates TO: Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. Proc Natl Acad Sci USA 1999, 96:4285-4288.
  • [30]Snel B, Bork P, Huynen MA: The identification of functional modules from the genomic association of genes. Proc Natl Acad Sci USA 2002, 99:5890-5895.
  • [31]Janga SC, Collado-Vides J, Moreno-Hagelsieb G: Nebulon: a system for the inference of functional relationships of gene products from the rearrangement of predicted operons. Nucleic Acids Res 2005, 33:2521-2530.
  • [32]Wang Z, Zhang XC, Le MH, Xu D, Stacey G, Cheng J: A protein domain co-occurrence network approach for predicting protein function and inferring species phylogeny. PLoS One 2011, 6:e17906.
  • [33]Han JH, Batey S, Nickson AA, Teichmann SA, Clarke J: The folding and evolution of multidomain proteins. Nat Rev Mol Cell Biol 2007, 8:319-330.
  • [34]Wuchty S: Scale-free behavior in protein domain networks. Mol Biol Evol 2001, 18:1694-1702.
  • [35]Przytycka T, Davis G, Song N, Durand D: Graph theoretical insights into evolution of multidomain proteins. J Comput Biol 2006, 13:351-363.
  • [36]Cohen-Gihon I, Nussinov R, Sharan R: Comprehensive analysis of co-occurring domain sets in yeast proteins. BMC Genomics 2007, 8:161. BioMed Central Full Text
  • [37]Hunter S, Jones P, Mitchell A, Apweiler R, Attwood TK, Bateman A, Bernard T, Binns D, Bork P, Burge S, et al.: InterPro in 2011: new developments in the family and domain prediction database. Nucleic Acids Res 2012, 40:D306-D312.
  • [38]Punta M, Coggill PC, Eberhardt RY, Mistry J, Tate J, Boursnell C, Pang N, Forslund K, Ceric G, Clements J, et al.: The Pfam protein families database. Nucleic Acids Res 2012, 40:D290-D301.
  • [39]Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, et al.: Gene ontology: tool for the unification of biology. Nat Genet 2000, 25:25-29.
  • [40]The UniProt Consortium: Reorganizing the protein space at the Universal Protein Resource (UniProt). Nucleic Acids Res 2012, 40:D71-D75.
  • [41]Dimmer EC, Huntley RP, Alam-Faruque Y, Sawford T, O’Donovan C, Martin MJ, Bely B, Browne P, Chan WM, Eberhardt R, et al.: The UniProt-GO Annotation database in 2011. Nucleic Acids Res 2012, 40:D565-D570.
  • [42]Dijkstra EW: A note on two problems in connexion with graphs. Numer Math 1959, 1:269-271.
  • [43]Ye Y, Godzik A: Comparative analysis of protein domain organization. Genome Res 2004, 14:343-353.
  • [44]Vogel C, Teichmann SA, Pereira-Leal J: The relationship between domain duplication and recombination. J Mol Biol 2005, 346:355-365.
  • [45]Du Z, Li L, Chen CF, Yu PS, Wang JZ: G-SESAME: web tools for GO-term-based gene similarity analysis and knowledge discovery. Nucleic Acids Res 2009, 37:W345-W349.
  文献评价指标  
  下载次数:5次 浏览次数:6次