BMC Systems Biology | |
Novel semantic similarity measure improves an integrative approach to predicting gene functional associations | |
Igor Jurisica2  Fiona Broackes-Carter3  Daniela Rosu1  Fatemeh Vafaee3  | |
[1] Department of Computer Science, University of Toronto, Toronto, Canada;Techna Institute, University Health Network, Toronto, Canada;Ontario Cancer Institute and Campbell Family Cancer Research Institute, Princess Margaret Cancer Centre, University Health Network, Toronto, Canada | |
关键词: Systems biology; Semantic similarity measure; Gene annotation; Functional interactome; Protein interaction prediction; Gene functional association perdition; | |
Others : 1143032 DOI : 10.1186/1752-0509-7-22 |
|
received in 2012-09-20, accepted in 2013-03-01, 发布年份 2013 | |
【 摘 要 】
Background
Elucidation of the direct/indirect protein interactions and gene associations is required to fully understand the workings of the cell. This can be achieved through the use of both low- and high-throughput biological experiments and in silico methods. We present GAP (Gene functional Association Predictor), an integrative method for predicting and characterizing gene functional associations. GAP integrates different biological features using a novel taxonomy-based semantic similarity measure in predicting and prioritizing high-quality putative gene associations. The proposed similarity measure increases information gain from the available gene annotations. The annotation information is incorporated from several public pathway databases, Gene Ontology annotations as well as drug and disease associations from the scientific literature.
Results
We evaluated GAP by comparing its prediction performance with several other well-known functional interaction prediction tools over a comprehensive dataset of known direct and indirect interactions, and observed significantly better prediction performance. We also selected a small set of GAP’s highly-scored novel predicted pairs (i.e., currently not found in any known database or dataset), and by manually searching the literature for experimental evidence accessible in the public domain, we confirmed different categories of predicted functional associations with available evidence of interaction. We also provided extra supporting evidence for subset of the predicted functionally-associated pairs using an expert curated database of genes associated to autism spectrum disorders.
Conclusions
GAP’s predicted “functional interactome” contains ≈1M highly-scored predicted functional associations out of which about 90% are novel (i.e., not experimentally validated). GAP’s novel predictions connect disconnected components and singletons to the main connected component of the known interactome. It can, therefore, be a valuable resource for biologists by providing corroborating evidence for and facilitating the prioritization of potential direct or indirect interactions for experimental validation. GAP is freely accessible through a web portal: http://ophid.utoronto.ca/gap webcite.
【 授权许可】
2013 Vafaee et al.; licensee BioMed Central Ltd.
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
20150328223652720.pdf | 3007KB | download | |
Figure 7. | 141KB | Image | download |
Figure 6. | 50KB | Image | download |
Figure 5. | 48KB | Image | download |
Figure 4. | 134KB | Image | download |
Fig. 4. | 123KB | Image | download |
Figure 2. | 64KB | Image | download |
Figure 1. | 58KB | Image | download |
【 图 表 】
Figure 1.
Figure 2.
Fig. 4.
Figure 4.
Figure 5.
Figure 6.
Figure 7.
【 参考文献 】
- [1]Hart G: How complete are current yeast and human protein-interaction networks? Genome Biol 2006, 7(11):120. BioMed Central Full Text
- [2]McDowall M: PIPs: Human protein-protein interactions prediction database. Nucleic Acids Res 2009, 37(Database issue):D651-D656.
- [3]Chen L: hmChIP: a database and web server for exploring publicly available human and mouse ChIP-seq and ChIP-chip data. Bioinformatics 2011, 27(10):1447-1448.
- [4]Aranda B: PSICQUIC and PSISCORE: accessing and scoring molecular interactions. Nat Meth 2011, 8:528-529.
- [5]Shirdel E: NAViGaTing the Micronome. Using multiple microrna prediction databases to identify signalling pathway-associated microRNAs. PLoS ONE 2011., 6(2) [10.1371/journal.pone.0017429]
- [6]Zhang F, Drabier R: IPAD: the integrated pathway analysis database for systematic enrichment analysis. BMC Bioinformatics 2012., 13(Suppl 15) [10.1186/1471–2105–13–S15–S7]
- [7]Wodak S: Challenges and rewards of interaction proteomics. Genome Biol 2009, 8:3-18.
- [8]Chen Y, Xu D: Computational analyses of high-throughput protein-protein interaction data. Curr Protein Pept Sci 2003, 4(3):159-181.
- [9]Bader JS: Gaining confidence in high-throughput protein interaction networks. Nat Biotechnol 2004, 22:78-85.
- [10]Lin X: Assessing reliability of protein-protein interactions by integrative analysis of data in model organisms. BMC Bioinformatics 2009, 29(10 Suppl 4):S5.
- [11]You Z: Using manifold embedding for assessing and predicting protein interactions from high-throughput experimental data. Bioinformatics 2010, 1(26(21):2744-2751.
- [12]Tong A: A combined experimental and computational strategy to define protein interaction networks for peptide recognition modules. Science 2002, 295(5553):321-324.
- [13]Rhodes DR: Probabilistic model of the human protein-protein interaction network. Nat Biotechnol 2005, 23(8):951-959.
- [14]Wu G: A human functional protein interaction network and its application to cancer data analysis. Genome Biol 2010, 11:R53. BioMed Central Full Text
- [15]Szklarczyk D: The STRING database in 2011: functional interaction networks of proteins, globally integrated and scored. Nucleic Acids Res 2011, 39:D561-d568.
- [16]Mostafavi S: GeneMANIA: a real-time multiple association network integration algorithm for predicting gene function. Genome Biol 2008, 9:S4.
- [17]HUGO Gene Nomenclature Committee (HGNC) European Bioinformatics Institute (EMBL-EBI) Retrieved Jan,2012. from [http://www.genenames.org/ webcite]
- [18]Brown K, Jurisica: Unequal evolutionary conservation of human protein interactions in interologous networks. Genome Biol 2007, 8(5):R95. BioMed Central Full Text
- [19]Pall F: Cluster analysis of networks generated through homology: automatic identification of important protein communities involved in cancer metastasis. BMC Bioinformatics 2006, 7:2. http://dx.doi.org/10.1186/1471-2105-7-2 webcite BioMed Central Full Text
- [20]He M: PPI Finder: a mining tool for human protein-protein interactions. PLoS ONE 2009, 4(2):e4554.
- [21]Doms A, Schroeder M: Semantic search with GoPubMed. Semantic Tech Web, Springer 2009, 5500:309-342. [http://gopubmed.org webcite]
- [22]Manning CD: Introduction to Information Retrieval. New York: Cambridge University Press; 2008.
- [23]Resnik P: Using information content to evaluate semantic similarity in a taxonomy. 14th Int Joint Conf for AI (IJCAI-95) 1995, 1:448-453.
- [24]Seco N: An intrinsic information content metric for semantic similarity in WordNet. 16th European Conf AI 2004.
- [25]Miller GA: WordNet: An online lexical database. Int J Lexicograph 1990, 3(4):235-244.
- [26]Tsuruoka Y: FACTA: a text search engine for finding associated biomedical concepts. Retrieved Jan, 2012 from [http://refine1-nactem.mc.man.ac.uk/facta/ webcite]
- [27]GoDisease: Powered by Transinsight Enterprise Semantic Intelligence Service. Retrieved Jan, 2012 from [http://www.godisease.com/ webcite]
- [28]Matthews L: Reactome knowledgebase of biological pathways and processes. Nucleic Acids Res 2008, 37(Database issue):D619-D622.
- [29]Kanehisa M, Goto S: KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res 2000, 28:27-30.
- [30]Kandasamy K: NetPath: a public resource of curated signal transduction pathways. Genome Biol 2010, 11:R3. BioMed Central Full Text
- [31]Carl F: PID: The pathway interaction database. Nucleic Acids Res 2009, 37(Database issue):D674-D679.
- [32]Brown K, Jurisica I: Online predicted human interaction database OPHID. Bioinformatics 2005, 21(9):2076-2082.
- [33]Stark C: BioGRID: a general repository for interaction datasets. Nucleic Acids Res 2006, 34(Database Issue):D535-D539.
- [34]Bader G, Hogue C: BIND–a data specification for storing and describing biomolecular interactions, molecular complexes and pathways. Bioinformatics 2000, 16(5):465-477.
- [35]Peri S: Development of human protein reference database as an initial platform for approaching systems biology in humans. Bioinformatics 2003, 13(10):2363-2371.
- [36]Peri S: The IntAct molecular interaction database in 2010. Nucleic Acids Res 2010, 38(Database issue):D525-D531.
- [37]Zanzoni A: MINT: a Molecular INTeraction database. FEBS Lett 2002, 513:135-140.
- [38]Ruepp A: CORUM: the comprehensive resource of mammalian protein complexes–2009. Nucleic Acids Res 2009, 38(Database issue):D497-D501.
- [39]Smialowski P: The negatome database: a reference set of non-interacting proteinNetPath pairs. Nucleic Acids Res 2010, 38(Database issue):D540-D544.
- [40]Jansen R: A Bayesian networks approach for predicting protein-protein interactions from genomic data. Science 2003, 302:449-453.
- [41]Guo Y: Using support vector machine combined with auto covariance to predict protein-protein interactions from protein sequences. Nucleic Acids Res 2008, 36:3025-3030.
- [42]Ben-Hur A, Noble WS: Choosing negative examples for the prediction of protein-protein interactions. BMC Bioinformatics 2007, 7(Suppl 1):S2.
- [43]Qi Y: Systematic prediction of human membrane receptor interactions. Proteomics 2009, 9(23):5243-5255.
- [44]Jones K: Information retrieval experiment. London:Butterworths213-255.
- [45]Fawcett T: ROC graphs: notes and practical considerations for data mining researchers. 2003. [Technical Report HPL-2003-4, HP Labs]
- [46]Basu S: AutDB: a gene reference resource for autism research. Nucleic Acids Res 2009, 37(Database issue):D832-D836. [https://gene.sfari.org webcite]
- [47]Dai C: Essential role of integrin-linked kinase in podocyte biology: bridging the integrin and slit diaphragm signaling. J Am Soc Nephrol 2006, 17(8):2164-2175.
- [48]Kano MR: VEGF-A and FGF-2 synergistically promote neoangiogenesis through enhancement of endogenous PDGF-B–PDGFRB signaling. J Cell Sci 2005, 118:3759-3768.
- [49]Gilman SR: Rare de novo variants associated with autism implicate a large functional network of genes involved in formation and function of synapses. Neuron 2011, 70(5):898-907.
- [50]Geschwind DH: Autism: many genes, common pathways? Cell 2008, 135:391-395.
- [51]Wang K: Common genetic variants on 5p14.1 associate with autism spectrum disorders. Nature 2009, 459:528-533.