Distributed Data Integration Infrastructure | |
Critchlow, T ; Ludaescher, B ; Vouk, M ; Pu, C | |
Lawrence Livermore National Laboratory | |
关键词: Productivity; 99 General And Miscellaneous//Mathematics, Computing, And Information Science; Functionals; Construction; Design; | |
DOI : 10.2172/15003342 RP-ID : UCRL-CR-151855 RP-ID : W-7405-ENG-48 RP-ID : 15003342 |
|
美国|英语 | |
来源: UNT Digital Library | |
【 摘 要 】
The Internet is becoming the preferred method for disseminating scientific data from a variety of disciplines. This can result in information overload on the part of the scientists, who are unable to query all of the relevant sources, even if they knew where to find them, what they contained, how to interact with them, and how to interpret the results. A related issue is keeping up with current trends in information technology often taxes the end-user's expertise and time. Thus instead of benefiting from this information rich environment, scientists become experts on a small number of sources and technologies, use them almost exclusively, and develop a resistance to innovations that can enhance their productivity. Enabling information based scientific advances, in domains such as functional genomics, requires fully utilizing all available information and the latest technologies. In order to address this problem we are developing a end-user centric, domain-sensitive workflow-based infrastructure, shown in Figure 1, that will allow scientists to design complex scientific workflows that reflect the data manipulation required to perform their research without an undue burden. We are taking a three-tiered approach to designing this infrastructure utilizing (1) abstract workflow definition, construction, and automatic deployment, (2) complex agent-based workflow execution and (3) automatic wrapper generation. In order to construct a workflow, the scientist defines an abstract workflow (AWF) in terminology (semantics and context) that is familiar to him/her. This AWF includes all of the data transformations, selections, and analyses required by the scientist, but does not necessarily specify particular data sources. This abstract workflow is then compiled into an executable workflow (EWF, in our case XPDL) that is then evaluated and executed by the workflow engine. This EWF contains references to specific data source and interfaces capable of performing the desired actions. In order to provide access to the largest number of resources possible, our lowest level utilizes automatic wrapper generation techniques to create information and data wrappers capable of interacting with the complex interfaces typical in scientific analysis. The remainder of this document outlines our work in these three areas, the impact our work has made, and our plans for the future.
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
15003342.pdf | 296KB | download |