学位论文详细信息
Dossier: Distributed operating system and infrastructure for scientific data management
cyberinfrastructure;microservice architecture;adaptive control;edge-cloud architecture;scientific data management
Nguyen, Phuong Viet
关键词: cyberinfrastructure;    microservice architecture;    adaptive control;    edge-cloud architecture;    scientific data management;   
Others  :  https://www.ideals.illinois.edu/bitstream/handle/2142/101566/NGUYEN-DISSERTATION-2018.pdf?sequence=1&isAllowed=y
美国|英语
来源: The Illinois Digital Environment for Access to Learning and Scholarship
PDF
【 摘 要 】

As scientific advancement and discovery have become increasingly data-driven and interdisciplinary, there are urging needs for advanced cyberinfrastructure to support managing and process- ing scientific data generated from day-to-day research. However, the development of data-driven cyberinfrastructure for scientific research areas has often lagged behind the development of such tools in other engineering and IT-related fields. Such the development gap is due to various diversity challenges of scientific data management and processing. First, these are the challenges in terms of the diversity of scientific data and data processing tasks, as the cyberinfrastructure should be able to support managing and processing heterogeneous types of scientific data that have been captured from scientific instruments. Second, as the cyberinfrastructure must help to shorten time from digital capture of data to interpretation and insights, it is challenging for the infrastructure to deal with the diversity of users and scientific workload. Third, it is the diversity of scientific instruments. Since there is still a significant number of scientific instruments that run their scientific software tools on old operating systems (e.g., Windows XP, Windows NT, Windows 2000), the cyberinfrastructure must help to bridge the performance and security gap between old scientific instruments and its advanced cloud-based infrastructure.In this thesis, we aim to address the above diversity challenges by taking a holistic approach in designing a distributed operating system and infrastructure for scientific data management, named DOSSIER. At the core of DOSSIER is an adaptive control microservice infrastructure that is de- signed to tackle the aforementioned challenges of data cyberinfrastructure for distributed scientific data management. Particularly, to handle heterogeneous scientific data processing and analysis, we start with redesigning the execution environment for scientific workflows, which traditionally follows a monolithic approach, using a novel microservice architecture and latest virtualization technology (i.e., container technology). The microservice design enables dynamic composition of workflows, and thus, is efficient in dealing with heterogeneous workflows. The new microservice architecture also allows us to express system resources in a more simple way, and thus, enables the design of a new adaptive resource management mechanism to handle large-scale and dynamic scientific workloads. We are the first to apply feedback control theory to design a self-adaptation mechanism for scientific workflow management system to help shorten the time from data acquisition to insights. To address the security and performance gap issues when connecting old scientific instruments to cloud-based cyberinfrastructure, we design an edge-cloud architecture that puts cloudlet servers directly connected to the scientific instruments and act as the security shield for the aging instruments. Cloudlets will also coordinate with cloud-based backend system to tackle the performance issue by scheduling data transfer and offloading processing tasks to cloudlets to avoid traffic congestion and guarantee performance of data processing jobs across edge-cloud architecture.By designing, developing, and testing DOSSIER in the real scientific environments, we demonstrate that an edge-cloud microservice architecture with learning-based adaptive control resource management is needed for timely distributed scientific data management.

【 预 览 】
附件列表
Files Size Format View
Dossier: Distributed operating system and infrastructure for scientific data management 4548KB PDF download
  文献评价指标  
  下载次数:21次 浏览次数:36次