学位论文

【摘要】

High-throughput experiments and ultrascale computing generate scientific data of growing size and complexity. These trends challenge traditional data analysis environments, most of which are based on scripting languages such as R, MATLAB or IDL, in a number of ways. To address some of these challenges, this research proposes a framework with the overarching goal to enable large-scale high-performance data analytics and collaborative knowledge annotation over the Web.The proposed framework has three major components, which parallel the three core steps of the knowledge discovery cycle.For the first step, defining the data analysis pipeline, the research designs and implements a Web-enabled interactive and collaborative statistical R-based environment.The component implements a memory management system that minimizes memory requirements thereby enabling multi-user scalability.To the best of our knowledge, this is the first Web-enabled R system that supports interactive remote access to R servers and enables users to share data, results and analysis sessions. For the second step, executing the data analysis pipeline, the research investigates and proposes a transparent and low-overhead means for executing external compiled language parallel codes from within R, thus seamlessly bridging two code development paradigms: efficient, compiled parallel codes and high abstraction and easy-to-use scripting codes.This component contains three elements: a transparent bidirectional translation of data objects between R and compiled languages, such as C/C++/Fortran; seamless integration of external parallel codes; and automatic parallelization of data-parallel computations in hybrid multi-core and multi-node execution environments.For the third step, annotating the predictive knowledge derived from community analysis pipelines, the research explores an environment for semantically rich, structured and queriable annotation of facts, relationships between those facts, and complex events reported in scientific literature.The social networking nature of this component allows the community to improve the predictions as well as generate new, higher-level inferences, thus filling in the gaps in the communities' understanding of physical phenomena.The environment offers mechanisms for streamlining the annotated and curated knowledge into distributed public databases, thus enabling a feedback loop into the database-publication cycle to allow scientists to make connections between data-driven predictions and published evidence.

【预览】

附件列表
Files	Size	Format	View
A Transparent Collaborative Framework for Efficient Data Analysis and Knowledge Annotation on the Web	9175KB	PDF	download


A Transparent Collaborative Framework for Efficient Data Analysis and Knowledge Annotation on the Web
statistical data analysis;Web;annotation
Breimyer, Paul William ; Professor Nagiza F. Samatova, Committee Chair,Professor Steffen Heber, Committee Member,Professor Tao Xie, Committee Member,Professor Mladen Vouk, Committee Member,Breimyer, Paul William ; Professor Nagiza F. Samatova ; Committee Chair ; Professor Steffen Heber ; Committee Member ; Professor Tao Xie ; Committee Member ; Professor Mladen Vouk ; Committee Member
University:North Carolina State University
关键词: statistical data analysis; Web; annotation;
Others : https://repository.lib.ncsu.edu/bitstream/handle/1840.16/4020/etd.pdf?sequence=1&isAllowed=y
美国\|英语
来源: null
PDF


	文献评价指标
	下载次数：60次	浏览次数：15次

【 摘 要 】

【 预 览 】

【摘要】

【预览】