期刊论文详细信息
BMC Bioinformatics
Ciruvis: a web-based tool for rule networks and interaction detection using rule-based classifiers
Susanne Bornelöv2  Simon Marillet3  Jan Komorowski1 
[1] Institute of Computer Science, Polish Academy of Sciences, 01-248 Warsaw, Poland
[2] Department of Cell and Molecular Biology, Science for Life Laboratory, Uppsala University, 751 24 Uppsala, Sweden
[3] Current address: INRIA Sophia-Antipolis-Méditerranée, Algorithms-Biology-Structure, Sophia-Antipolis, France
关键词: Rule-based classification;    Classification;    Interaction detection;    Interactions;    Rules;    Visualization;   
Others  :  818594
DOI  :  10.1186/1471-2105-15-139
 received in 2013-11-11, accepted in 2014-04-07,  发布年份 2014
PDF
【 摘 要 】

Background

The use of classification algorithms is becoming increasingly important for the field of computational biology. However, not only the quality of the classification, but also its biological interpretation is important. This interpretation may be eased if interacting elements can be identified and visualized, something that requires appropriate tools and methods.

Results

We developed a new approach to detecting interactions in complex systems based on classification. Using rule-based classifiers, we previously proposed a rule network visualization strategy that may be applied as a heuristic for finding interactions. We now complement this work with Ciruvis, a web-based tool for the construction of rule networks from classifiers made of IF-THEN rules. Simulated and biological data served as an illustration of how the tool may be used to visualize and interpret classifiers. Furthermore, we used the rule networks to identify feature interactions, compared them to alternative methods, and computationally validated the findings.

Conclusions

Rule networks enable a fast method for model visualization and provide an exploratory heuristic to interaction detection. The tool is made freely available on the web and may thus be used to aid and improve rule-based classification.

【 授权许可】

   
2014 Bornelöv et al.; licensee BioMed Central Ltd.

【 预 览 】
附件列表
Files Size Format View
20140711120104247.pdf 1533KB PDF download
Figure 6. 261KB Image download
Figure 5. 168KB Image download
Figure 4. 48KB Image download
Figure 2. 102KB Image download
Figure 1. 334KB Image download
【 图 表 】

Figure 1.

Figure 2.

Figure 4.

Figure 5.

Figure 6.

【 参考文献 】
  • [1]Moore JH, Asselbergs FW, Williams SM: Bioinformatics challenges for genome-wide association studies. Bioinformatics 2010, 26(4):445-455.
  • [2]Schwarz DF, Konig IR, Ziegler A: On safari to Random Jungle: a fast implementation of Random Forests for high-dimensional data. Bioinformatics 2010, 26(14):1752-1758.
  • [3]Touw WG, Bayjanov JR, Overmars L, Backus L, Boekhorst J, Wels M, van Hijum SA: Data mining in the Life Sciences with Random Forest: a walk in the park or lost in the jungle? Brief Bioinform 2013, 14(3):315-326.
  • [4]Manolio TA, Collins FS, Cox NJ, Goldstein DB, Hindorff LA, Hunter DJ, McCarthy MI, Ramos EM, Cardon LR, Chakravarti A, Cho JH, Guttmacher AE, Kong A, Kruglyak L, Mardis E, Rotimi CN, Slatkin M, Valle D, Whittemore AS, Boehnke M, Clark AG, Eichler EE, Gibson G, Haines JL, Mackay TF, McCarroll SA, Visscher PM: Finding the missing heritability of complex diseases. Nature 2009, 461(7265):747-753.
  • [5]Jakobsdottir J, Gorin MB, Conley YP, Ferrell RE, Weeks DE: Interpretation of genetic association studies: markers with replicated highly significant odds ratios may be poor classifiers. PLoS Genet 2009, 5(2):e1000337.
  • [6]Wan X, Yang C, Yang Q, Xue H, Tang NL, Yu W: Predictive rule inference for epistatic interaction detection in genome-wide association studies. Bioinformatics 2010, 26(1):30-37.
  • [7]Lagreid A, Hvidsten TR, Midelfart H, Komorowski J, Sandvik AK: Predicting gene ontology biological process from temporal gene expression patterns. Genome Res 2003, 13(5):965-979.
  • [8]Calvo-Dmgz D, Gálvez JF, Glez-Peña D, Gómez-Meire S, Fdez-Riverola F: Using variable precision rough set for selection and classification of biological knowledge integrated in DNA gene expression. J Integr Bioinform 2011, 9(3):199-199.
  • [9]Kontijevskis A, Wikberg JE, Komorowski J: Computational proteomics analysis of HIV-1 protease interactome. Proteins 2007, 68(1):305-312.
  • [10]Strombergsson H, Kryshtafovych A, Prusis P, Fidelis K, Wikberg JE, Komorowski J, Hvidsten TR: Generalized modeling of enzyme-ligand interactions using proteochemometrics and local protein substructures. Proteins 2006, 65(3):568-579.
  • [11]Kruczyk M, Zetterberg H, Hansson O, Rolstad S, Minthon L, Wallin A, Blennow K, Komorowski J, Andersson MG: Monte Carlo feature selection and rule-based models to predict Alzheimer’s disease in mild cognitive impairment. J Neural Transm 2012, 119(7):821-831.
  • [12]Komorowski J, Øhrn A, Skowron A: The ROSETTA Rough Set Software System. In Handbook of Data Mining and Knowledge. Edited by Klösgen WZJ. New York: Oxford University Press; 2002.
  • [13]Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH: The WEKA data mining software: an update. SIGKDD Explor Newsl 2009, 11(1):10-18.
  • [14]Buono P, Costabile M: Visualizing Association Rules in a Framework for Visual Data Mining. In From Integrated Publication and Information Systems to Information and Knowledge Environments, vol. 3379. Edited by Hemmje M, Niederée C, Risse T. Berlin: Springer Berlin Heidelberg; 2005:221-231.
  • [15]Bruzzese D, Davino C: Visual Mining of Association Rules. In Visual Data Mining. Edited by Simeon JS, Michael HB, hlen, Arturas M. Berlin: Springer-Verlag; 2008:103-122.
  • [16]Hahsler M, Chelluboina S: Visualizing Association Rules in Hierarchical Groups. In 42nd Symposium on the Interface: Statistical, Machine Learning, and Visualization Algorithms. Cary, North Carolina: The Interface Foundation of North America; 2011.
  • [17]Krzywinski M, Schein J, Birol I, Connors J, Gascoyne R, Horsman D, Jones SJ, Marra MA: Circos: an information aesthetic for comparative genomics. Genome Res 2009, 19(9):1639-1645.
  • [18]Rainsford C, Roddick J: Visualisation of Temporal Interval Association Rules. In Intelligent Data Engineering and Automated Learning — IDEAL 2000 Data Mining, Financial Engineering, and Intelligent Agents, vol. 1983. Edited by Leung K, Chan L-W, Meng H. Berlin: Springer Berlin Heidelberg; 2000:91-96.
  • [19]Ciruvis - Circular Rule Visualization http://bioinf.icm.uu.se/~ciruvis webcite
  • [20]Bornelöv S, Enroth S, Komorowski J: Visualization of Rules in Rule-Based Classifiers. In Intelligent Decision Technologies, vol. 15. Edited by Watada J, Watanabe T, Phillips-Wren G, Howlett RJ, Jain LC. Berlin: Springer Berlin Heidelberg; 2012:329-338.
  • [21]De Ruysscher D, Severin D, Barnes E, Baumann M, Bristow R, Grégoire V, Hölscher T, Veninga T, Polański A, Veen E B: First report on the patient database for the identification of the genetic pathways involved in patients over-reacting to radiotherapy: GENEPI-II. Radiother Oncol 2010, 97(1):36-39.
  • [22]Kelley Pace R, Barry R: Sparse spatial autoregressions. Stat Probability Letters 1997, 33(3):291-297.
  • [23]Regression DataSets http://www.dcc.fc.up.pt/~ltorgo/Regression/DataSets.html webcite
  • [24]Sorokina D, Caruana R, Riedewald M, Fink D: Detecting Statistical Interactions With Additive Groves of Trees. In Proceedings of the 25th International Conference on Machine Learning; Helsinki, Finland. 1390282. New york: ACM; 2008:1000-1007.
  • [25]Bornelov S, Saaf A, Melen E, Bergstrom A, Torabi Moghadam B, Pulkkinen V, Acevedo N, Orsmark Pietras C, Ege M, Braun-Fahrlander C, Riedler J, Doekes G, Kabesch M, van Hage M, Kere J, Scheynius A, Soderhall C, Pershagen G, Komorowski J: Rule-based models of the interplay between genetic and environmental factors in childhood allergy. PLoS One 2013, 8(11):e80080.
  • [26]Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA, Bloomfield CD, Lander ES: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 1999, 286(5439):531-537.
  • [27]Alizadeh AA, Eisen MB, Davis RE, Ma C, Lossos IS, Rosenwald A, Boldrick JC, Sabet H, Tran T, Yu X, Powell JI, Yang L, Marti GE, Moore T, Hudson J Jr, Lu L, Lewis DB, Tibshirani R, Sherlock G, Chan WC, Greiner TC, Weisenburger DD, Armitage JO, Warnke R, Levy R, Wilson W, Grever MR, Byrd JC, Botstein D, Brown PO: Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 2000, 403(6769):503-511.
  • [28]Draminski M, Rada-Iglesias A, Enroth S, Wadelius C, Koronacki J, Komorowski J: Monte Carlo feature selection for supervised classification. Bioinformatics 2008, 24(1):110-117.
  • [29]Zhang B, Kirov S, Snoddy J: WebGestalt: an integrated system for exploring gene sets in various biological contexts. Nucleic Acids Res 2005, 33(Web Server issue):W741-748.
  • [30]Hvidsten TR, Wilczynski B, Kryshtafovych A, Tiuryn J, Komorowski J, Fidelis K: Discovering regulatory binding-site modules using rule-based learning. Genome Res 2005, 15(6):856-866.
  • [31]Ciruvis - Results from the paper http://bioinf.icm.uu.se/~ciruvis/paper webcite
  文献评价指标  
  下载次数:127次 浏览次数:45次