科技报告详细信息
Dynamic Non-Hierarchical File Systems for Exascale Storage
Long, Darrell E.1  Miller, Ethan L2 
[1] PI;Co PI
关键词: data storage;    file systems;    HEC;    HPC;    archival storage;    provenance;    cyber security;   
DOI  :  10.2172/1170868
RP-ID  :  DE-FC02-10ER26017/DE-SC0005417
PID  :  OSTI ID: 1170868
Others  :  Other: FC02
学科分类:工程和技术(综合)
美国|英语
来源: SciTech Connect
PDF
【 摘 要 】

This constitutes the final report for ???Dynamic Non-Hierarchical File Systems for Exascale Storage???. The ultimate goal of this project was to improve data management in scientific computing and high-end computing (HEC) applications, and to achieve this goal we proposed: to develop the first, HEC-targeted, file system featuring rich metadata and provenance collection, extreme scalability, and future storage hardware integration as core design goals, and to evaluate and develop a flexible non-hierarchical file system interface suitable for providing more powerful and intuitive data management interfaces to HEC and scientific computing users. Data management is swiftly becoming a serious problem in the scientific community ??? while copious amounts of data are good for obtaining results, finding the right data is often daunting and sometimes impossible. Scientists participating in a Department of Energy workshop noted that most of their time was spent ???...finding, processing, organizing, and moving data and it???s going to get much worse???. Scientists should not be forced to become data mining experts in order to retrieve the data they want, nor should they be expected to remember the naming convention they used several years ago for a set of experiments they now wish to revisit. Ideally, locating the data you need would be as easy as browsing the web. Unfortunately, existing data management approaches are usually based on hierarchical naming, a 40 year-old technology designed to manage thousands of files, not exabytes of data. Today???s systems do not take advantage of the rich array of metadata that current high-end computing (HEC) file systems can gather, including content-based metadata and provenance1 information. As a result, current metadata search approaches are typically ad hoc and often work by providing a parallel management system to the ???main??? file system, as is done in Linux (the locate utility), personal computers, and enterprise search appliances. These search applications are often optimized for a single file system, making it difficult to move files and their metadata between file systems. Users have tried to solve this problem in several ways, including the use of separate databases to index file properties, the encoding of file properties into file names, and separately gathering and managing provenance data, but none of these approaches has worked well, either due to limited usefulness or scalability, or both. Our research addressed several key issues: ??? High-performance, real-time metadata harvesting: extracting important attributes from files dynami- cally and immediately updating indexes used to improve search. ??? Transparent, automatic, and secure provenance capture: recording the data inputs and processing steps used in the production of each file in the system. ??? Scalable indexing: indexes that are optimized for integration with the file system. ??? Dynamic file system structure: our approach provides dynamic directories similar to those in semantic file systems, but these are the native organization rather than a feature grafted onto a conventional system. In addition to these goals, our research effort will include evaluating the impact of new storage technolo- gies on the file system design and performance. In particular, the indexing and metadata harvesting functions can potentially benefit from the performance improvements promised by new storage class memories.

【 预 览 】
附件列表
Files Size Format View
299KB PDF download
  文献评价指标  
  下载次数:16次 浏览次数:72次