BMC Bioinformatics | |
Addressing the unmet need for visualizing conditional random fields in biological data | |
William C Ray1  Samuel L Wolock3  Nicholas W Callahan1  Min Dong2  Q Quinn Li2  Chun Liang2  Thomas J Magliery1  Christopher W Bartlett1  | |
[1] The Ohio State University, 100 W. 18th Ave, 43210 Columbus, OH, USA | |
[2] Miami University, 501 E. High St., 45056 Oxford, OH, USA | |
[3] Nationwide Children’s Hospital, 575 Children’s Crossroad, 43215 Columbus, OH, USA | |
关键词: Conditional random fields; Bioinformatics; Graphical probabilistic models; Parallel coordinates; | |
Others : 818287 DOI : 10.1186/1471-2105-15-202 |
|
received in 2014-04-24, accepted in 2014-06-10, 发布年份 2014 | |
【 摘 要 】
Background
The biological world is replete with phenomena that appear to be ideally modeled and analyzed by one archetypal statistical framework - the Graphical Probabilistic Model (GPM). The structure of GPMs is a uniquely good match for biological problems that range from aligning sequences to modeling the genome-to-phenome relationship. The fundamental questions that GPMs address involve making decisions based on a complex web of interacting factors. Unfortunately, while GPMs ideally fit many questions in biology, they are not an easy solution to apply. Building a GPM is not a simple task for an end user. Moreover, applying GPMs is also impeded by the insidious fact that the “complex web of interacting factors” inherent to a problem might be easy to define and also intractable to compute upon.
Discussion
We propose that the visualization sciences can contribute to many domains of the bio-sciences, by developing tools to address archetypal representation and user interaction issues in GPMs, and in particular a variety of GPM called a Conditional Random Field(CRF). CRFs bring additional power, and additional complexity, because the CRF dependency network can be conditioned on the query data.
Conclusions
In this manuscript we examine the shared features of several biological problems that are amenable to modeling with CRFs, highlight the challenges that existing visualization and visual analytics paradigms induce for these data, and document an experimental solution called StickWRLD which, while leaving room for improvement, has been successfully applied in several biological research projects.
Software and tutorials are available at http://www.stickwrld.org/ webcite
【 授权许可】
2014 Ray et al.; licensee BioMed Central Ltd.
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
20140711092534823.pdf | 2344KB | download | |
Figure 12. | 33KB | Image | download |
Figure 11. | 34KB | Image | download |
Figure 10. | 27KB | Image | download |
Figure 9. | 100KB | Image | download |
Figure 8. | 168KB | Image | download |
Figure 7. | 192KB | Image | download |
Figure 6. | 65KB | Image | download |
Figure 5. | 70KB | Image | download |
Figure 4. | 24KB | Image | download |
Figure 3. | 50KB | Image | download |
Figure 2. | 135KB | Image | download |
Figure 1. | 89KB | Image | download |
【 图 表 】
Figure 1.
Figure 2.
Figure 3.
Figure 4.
Figure 5.
Figure 6.
Figure 7.
Figure 8.
Figure 9.
Figure 10.
Figure 11.
Figure 12.
【 参考文献 】
- [1]Gaur D, Shastri A, Biswas R: Metagraph: a new model of data structure. In Computer Science and Information Technology, 2008. ICCSIT ‘08. International Conference On.. New York, NY: IEEE Press; 2008:729-733.
- [2]Ray WC, Ozer HG, Armbruster DW, Daniels CJ: Beyond identity - when classical homology searching fails, why, and what you can do about it. In Proceedings of the 4th Ohio Collaborative Conference on Bioinformatics. New York, NY: IEEE Press; 2009:51-56.
- [3]Ray WC, Wolock SL, Li N, Bartlett CW: Stickwrld: interactive visualization of massive parallel contingency data for personalized analysis to facilitate precision medicine. In Proceedings of the 3rd Annual Workshop on Visual Analytics in Healthcare, in Conjunction with the American Medical Informatics Symposium. VAHC ‘13.. Bethesda, MD: AMIA; 2013:68-71.
- [4]Gibbs JW: Elementary Principles in Statistical Mechanics: Developed with Especial Reference to the Rational Foundations of Thermodynamics. New York: Yale bicentennial publications, C. Scribner’s sons; 1902.
- [5]Wright S: Correlation and causation. J Agric Res 1921, 20:557-585.
- [6]Markov AA: Extension of the law of large numbers to dependent quantities (in Russian). Izvestiya Fiziko-Matematicheskikh Obschestva Kazan University 1906, 15:135-156.
- [7]Bartlett MS: Contingency table interactions. Supplement J Roy Stat Soc 1935, 2(2):248-252.
- [8]Seneta E: Markov and the birth of chain dependence theory. International Statistical Review/Revue Internationale de Statistique 1996, 64(3):255-263.
- [9]Yang L: Visualizing frequent itemsets, association rules, and sequential patterns in parallel coordinates. In Computational Science and Its Applications—ICCSA 2003. Lecture Notes in Computer Science vol. 2667. Edited by Kumar V, Gavrilova M, Tan C, L’Ecuyer P. Berlin: Springer; 2003:21-30. [http://dx.doi.org/10.1007/3-540-44839-X_3 webcite]
- [10]Lafferty JD, McCallum A, Pereira FCN: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In Proceedings of the Eighteenth International Conference on Machine Learning. ICML ‘01.. San Francisco: Morgan Kaufmann Publishers Inc.; 2001:282-289. [http://dl.acm.org/citation.cfm?id=645530.655813 webcite]
- [11]Inselberg A: The plane with parallel coordinates. Vis Comput 1985, 1:69-91.
- [12]Rosario GE, Rundensteiner EA, Brown DC, Ward MO, Huang S: Mapping nominal values to numbers for effective visualization. Inform Visual 2004, 3(2):80-95.
- [13]Bendix F, Kosara R, Hauser H: Parallel sets: visual analysis of categorical data. In Information Visualization, 2005. INFOVIS 2005. IEEE Symposium On. New York, NY: IEEE Press; 2005:133-140.
- [14]Lind M, Johansson J, Cooper M: Many-to-many relational parallel coordinates displays. In Proceedings of the 2009 13th International Conference Information Visualisation. IV ‘09.. Washington, DC: IEEE Computer Society; 2009:25-31.
- [15]Claessen JHT, van Wijk JJ: Flexible linked axes for multivariate data visualization. IEEE Trans Vis Comput Graph 2011, 17(12):2310-2316.
- [16]Lu LF, Huang ML, Huang T-H: A new axes re-ordering method in parallel coordinates visualization. In Machine Learning and Applications (ICMLA), 2012 11th International Conference On. vol. 2. New York, NY: IEEE Press; 2012:252-257.
- [17]Makwana H, Tanwani S, Jain S: Article: axes re-ordering in parallel coordinate for pattern optimization. Int J Comput Appl 2012, 40(13):43-48. Published by Foundation of Computer Science, New York, USA.
- [18]Fanea E, Carpendale S, Isenberg T: An interactive 3d integration of parallel coordinates and star glyphs. In Information Visualization, 2005. INFOVIS 2005. IEEE Symposium On. New York, NY: IEEE Press; 2005:149-156.
- [19]Johansson J, Ljung P, Jern M, Cooper M: Revealing structure in visualizations of dense 2d and 3d parallel coordinates. In Inf Vis. Thousand Oaks, CA: SAGE Publications; 2006.
- [20]Kerren A, Jusufi I: 3d kiviat diagrams for the interactive analysis of software metric trends. In Proceedings of the 5th International Symposium on Software Visualization. SOFTVIS ‘10. New York: ACM; 2010:203-204. [http://doi.acm.org/10.1145/1879211.1879241 webcite]
- [21]Schmidt M, Alahari K: Generalized fast approximate energy minimization via graph cuts: alpha-expansion beta-shrink moves. In Proceedings of the 2011 IEEE Conference on Uncertainty in Artificial Intelligence. UAI’11. New York, NY: IEEE Press; 2011:653-660.
- [22]Berry M, Phillips GN Jr: Crystal structures of bacillus stearothermophilus adenylate kinase with bound Ap5A,Mg2+Ap5a, and Mn2+ Ap5A reveal an intermediate lid position and six coordinate octahedral geometry for bound Mg2+ and Mn2+. Prot Str Func Gen 1998, 32:276-288.
- [23]Gavel OY, Bursakov SA, DiRocco G, Trincao J, Pickering IJ, George GN, Calvete JJ, Shnyrov VL, Brondino CD, Pereira AS, Lampreia J, Tavares P, Maura JJ, Maura I: A new type of metal-binding site in cobalt- and zinc-containing adenylate kinases isolated from sulfate-reducers desulfovibrio gigas and desulfovibrio desulfuricans atcc 27774. J Inorganic Bioc 2008, 102:1380-1395.
- [24]Berry MB, Bae E, Bilderback TR, Glaser M, Philips GN Jr: Crystal structure of ADP/AMP construct of escherichia coli adenylate kinase. PROTEINS 2005, 62:555-556.
- [25]Ray WC: MAVL/StickWRLD: Visualizing protein sequence families to detect non-consensus features. Nucleic Acids Res 2005, 33(Web Server Issue):315-319.
- [26]Perrier V, Burlacu-Miron S, Bourgeois S, Surewicz WK, Gilles A-M: Genetically engineered zinc-chelating adenylate kinase from Eschericia coli, with enhanced thermal stability. J Biol Chem 1998, 273:19097-19101.
- [27]Sim N-L, Kumar P, Hu J, Henikoff S, Schneider G, Ng PC: Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic Acids Res 2012, 40(Web-Server-Issue):452-457.
- [28]Adzhubei I, Jordan DM, Sunyaev SR: Predicting functional effect of human missense mutations using PolyPhen-2. Current protocols in human genetics 2013, 7:7.20.1-7.21.41.
- [29]Ray WC: MAVL/StickWRLD: visually exploring relationships in nucleic-acid sequence alignments. Nucleic Acids Res 2004, 32(Web Server Issue):59-63.
- [30]Fisher RA: On the interpretation of x2 from contingency tables, and the calculation of p. J Roy Stat Soc 1922, 85(1):87-94.
- [31]Ozer HG: Residue associations in protein family alignments. PhD thesis. The Ohio State University, June 2008