期刊论文详细信息
GigaScience
Research prioritization through prediction of future impact on biomedical science: a position paper on inference-analytics
Naoki Orii1  Madhavi K Ganapathiraju1 
[1] Language Technologies Institute, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
关键词: Big-data;    Protein-protein interaction prediction;    Inference analytics;    Data analytics;    Impact prediction;   
Others  :  861536
DOI  :  10.1186/2047-217X-2-11
 received in 2013-03-09, accepted in 2013-07-31,  发布年份 2013
PDF
【 摘 要 】

Background

Advances in biotechnology have created “big-data” situations in molecular and cellular biology. Several sophisticated algorithms have been developed that process big data to generate hundreds of biomedical hypotheses (or predictions). The bottleneck to translating this large number of biological hypotheses is that each of them needs to be studied by experimentation for interpreting its functional significance. Even when the predictions are estimated to be very accurate, from a biologist’s perspective, the choice of which of these predictions is to be studied further is made based on factors like availability of reagents and resources and the possibility of formulating some reasonable hypothesis about its biological relevance. When viewed from a global perspective, say from that of a federal funding agency, ideally the choice of which prediction should be studied would be made based on which of them can make the most translational impact.

Results

We propose that algorithms be developed to identify which of the computationally generated hypotheses have potential for high translational impact; this way, funding agencies and scientific community can invest resources and drive the research based on a global view of biomedical impact without being deterred by local view of feasibility. In short, data-analytic algorithms analyze big-data and generate hypotheses; in contrast, the proposed inference-analytic algorithms analyze these hypotheses and rank them by predicted biological impact. We demonstrate this through the development of an algorithm to predict biomedical impact of protein-protein interactions (PPIs) which is estimated by the number of future publications that cite the paper which originally reported the PPI.

Conclusions

This position paper describes a new computational problem that is relevant in the era of big-data and discusses the challenges that exist in studying this problem, highlighting the need for the scientific community to engage in this line of research. The proposed class of algorithms, namely inference-analytic algorithms, is necessary to ensure that resources are invested in translating those computational outcomes that promise maximum biological impact. Application of this concept to predict biomedical impact of PPIs illustrates not only the concept, but also the challenges in designing these algorithms.

【 授权许可】

   
2013 Ganapathiraju and Orii; licensee BioMed Central Ltd.

【 预 览 】
附件列表
Files Size Format View
20140725002119178.pdf 982KB PDF download
61KB Image download
67KB Image download
25KB Image download
22KB Image download
27KB Image download
80KB Image download
47KB Image download
【 图 表 】

【 参考文献 】
  • [1]Amazon mechanical turk. http://www.mturk.com webcite
  • [2]Den Hartigh JC, Van Bergen En Henegouwen PM, Verkleij AJ, Boonstra J: The EGF receptor is an actin-binding protein. J Cell Biol 1992, 119:349-355.
  • [3]Gehrke J, Ginsparg P, Kleinberg J: Overview of the 2003 KDD Cup. SIGKDD Explor Newsl 2003, 5:149-151.
  • [4]Fu LD, Aliferis C: Models for predicting and explaining citation count of biomedical articles. AMIA Annual Symposium proceedings / AMIA Symposium AMIA Symposium; 2008:222-226.
  • [5]Ibanez A, Larranaga P, Bielza C: Predicting citation count of Bioinformatics papers within four years of publication. Bioinformatics 2009, 25:3303-3309.
  • [6]Bornmann L, Daniel HD: What do citation counts measure? A review of studies on citing behavior. J Doc 2008, 64:45-80.
  • [7]Stumpf MP, Thorne T, de Silva E, Stewart R, An HJ, Lappe M, Wiuf C: Estimating the size of the human interactome. Proc Natl Acad Sci USA 2008, 105:6959-6964.
  • [8]Venkatesan K, Rual JF, Vazquez A, Stelzl U, Lemmens I, Hirozane-Kishikawa T, Hao T, Zenkner M, Xin X, Goh KI, et al.: An empirical framework for binary interactome mapping. Nat Methods 2009, 6:83-90.
  • [9]Ramírez F, Schlicker A, Assenov Y, Lengauer T, Albrecht M: Computational analysis of human protein interaction networks. Proteomics 2007, 7:2541-2552.
  • [10]Shoemaker BA, Panchenko AR: Deciphering protein-protein interactions. Part I. Experimental techniques and databases. PLoS computational biology 2007, 3:e42.
  • [11]Shoemaker BA, Panchenko AR: Deciphering protein-protein interactions. Part II. Computational methods to predict protein and domain interaction partners. PLoS computational biology 2007, 3:e43.
  • [12]Qi Y, Dhiman HK, Bhola N, Budyak I, Kar S, Man D, Dutta A, Tirupula K, Carr BI, Grandis J, et al.: Systematic prediction of human membrane receptor interactions. Proteomics 2009, 9:5243-5255.
  • [13]Scott MS, Barton GJ: Probabilistic prediction and ranking of human protein-protein interactions. BMC Bioinforma 2007, 8:239. BioMed Central Full Text
  • [14]Ananthasubramanian S, Metri R, Khetan A, Gupta A, Handen A, Chandra N, Ganapathiraju M: Mycobacterium tuberculosis and Clostridium difficille interactomes: demonstration of rapid development of computational system for bacterial interactome prediction. Microbial informatics and experimentation 2012, 2:4. BioMed Central Full Text
  • [15]Ma X, Gao L: Biological network analysis: insights into structure and functions. Briefings in functional genomics 2012, 11:434-442.
  • [16]Bultinck J, Lievens S, Tavernier J: Protein-protein interactions: network analysis and applications in drug discovery. Current pharmaceutical design 2012, 18:4619-4629.
  • [17]Cui T, Zhang L, Wang X, He ZG: Uncovering new signaling proteins and potential drug targets through the interactome analysis of Mycobacterium tuberculosis. BMC Genomics 2009, 10:118. BioMed Central Full Text
  • [18]Azmi AS, Wang Z, Philip PA, Mohammad RM, Sarkar FH: Proof of concept: network and systems biology approaches aid in the discovery of potent anticancer drug combinations. Molecular cancer therapeutics 2010, 9:3137-3144.
  • [19]Barabasi AL, Gulbahce N, Loscalzo J: Network medicine: a network-based approach to human disease. Nature reviews Genetics 2011, 12:56-68.
  • [20]Prasad TSK, Goel R, Kandasamy K, Keerthikumar S, Kumar S, Mathivanan S, Telikicherla D, Raju R, Shafreen B, Venugopal A, et al.: Human protein reference database-2009 update. Nucleic acids research 2009, 37:D767-D772.
  • [21]Stark C, Breitkreutz BJ, Chatr-Aryamontri A, Boucher L, Oughtred R, Livstone MS, Nixon J, Van Auken K, Wang X, Shi X, et al.: The BioGRID interaction database: 2011 update. Nucleic acids research 2011, 39:D698-D704.
  • [22]The Entrez programming utilities. http://www.ncbi.nlm.nih.gov/books/NBK25501/ webcite
  • [23]Wasserman S, Faust K: Social network analysis : methods and applications. Cambridge, New York: Cambridge University Press; 1994.
  • [24]Barabasi AL, Oltvai ZN: Network biology: understanding the cell’s functional organization. Nature reviews Genetics 2004, 5:101-113.
  • [25]Mason O, Verwoerd M: Graph theory and networks in biology. IET systems biology 2007, 1:89-119.
  • [26]Yu H, Greenbaum D, Xin Lu H, Zhu X, Gerstein M: Genomic analysis of essentiality within protein networks. Trends in genetics : TIG 2004, 20:227-231.
  • [27]Jeong H, Mason SP, Barabasi AL, Oltvai ZN: Lethality and centrality in protein networks. Nature 2001, 411:41-42.
  • [28]Goh KI, Cusick ME, Valle D, Childs B, Vidal M, Barabasi AL: The human disease network. Proc Natl Acad Sci USA 2007, 104:8685-8690.
  • [29]Smoot ME, Ono K, Ruscheinski J, Wang PL, Ideker T: Cytoscape 2.8: new features for data integration and network visualization. Bioinformatics 2011, 27:431-432.
  • [30]Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, Amin N, Schwikowski B, Ideker T: Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res 2003, 13:2498-2504.
  • [31]Kohl M, Wiese S, Warscheid B: Cytoscape: software for visualization and analysis of biological networks. Methods Mol Biol 2011, 696:291-303.
  • [32]Stanford network analysis library. http://snap.stanford.edu/snap/ webcite
  • [33]Hagberg A, Schult D, Swart P: Exploring network structure, dynamics, and function using NetworkX. SciPy 2008: Proceedings of the 7th Python in Science Conference; 2008:11-15.
  • [34]Freeman L: A Set of measures of centrality based on betweenness. Sociometry 1977, 40:35-41.
  • [35]Page L, Brin S, Motwani R, Winograd T: The PageRank citation ranking: bringing order to the Web. Stanford InfoLab: Technical Report; 1998.
  • [36]Brandes U: On variants of shortest-path betweenness centrality and their generic computation. Soc Networks 2008, 30:136-145.
  • [37]Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, et al.: Scikit-learn: machine learning in python. J Mach Learn Res 2011, 12:2825-2830.
  • [38]Breiman L: Random forests. Mach Learn 2001, 45:5-32.
  • [39]Kingsford C, Salzberg SL: What are decision trees? Nat Biotechnol 2008, 26:1011-1013.
  • [40]Chen XW, Liu M: Prediction of protein-protein interactions using random decision forest framework. Bioinformatics 2005, 21:4394-4400.
  • [41]Davis J, Goadrich M: The relationship between precision-recall and ROC curves. Pittsburgh, Pennsylvania: Proceedings of the 23rd international conference on Machine learning; 2006:233-240. 1143874: ACM
  • [42]Orii N, Ganapathiraju MK: Wiki-pi: a web-server of annotated human protein-protein interactions to aid in discovery of protein function. PloS one 2012, 7:e49029.
  • [43]Maere S, Heymans K, Kuiper M: BiNGO: a Cytoscape plugin to assess overrepresentation of gene ontology categories in biological networks. Bioinformatics 2005, 21:3448-3449.
  • [44]Hirschhorn JN: Genomewide association studies–illuminating biologic pathways. N Engl J Med 2009, 360:1699-1701.
  • [45]A catalog of published genome-wide association studies. http://www.genome.gov/gwastudies webcite July 17th, 2012
  • [46]Hindorff LA, Sethupathy P, Junkins HA, Ramos EM, Mehta JP, Collins FS, Manolio TA: Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc Natl Acad Sci USA 2009, 106:9362-9367.
  • [47]Hakes L, Pinney JW, Robertson DL, Lovell SC: Protein-protein interaction networks and biology–what’s the connection? Nat Biotechnol 2008, 26:69-72.
  文献评价指标  
  下载次数:46次 浏览次数:14次