科技报告详细信息
Forming Aggregations using Virtual Sharding: Lessons Learned from Simple Scalable Storage (S3)
Gallagher, James ; Potter, Nathan ; Neumiller, Kodi
关键词: ALGORITHMS;    AGGREGATES;    DATA SYSTEMS;    WEB SERVICES;    METADATA;    OPEN SOURCE LICENSING (COMPUTERS);    DATA BASES;    LIBRARIES;    PROTOCOL (COMPUTERS);   
RP-ID  :  GSFC-E-DAA-TN76065
学科分类:地球科学(综合)
美国|英语
来源: NASA Technical Reports Server
PDF
【 摘 要 】

Data aggregation is the ability to combine separate datasets to form a single new logical dataset provides users with a powerful abstraction. The advantage of an aggregate dataset is that the users are freed from having to understand, and incorporate into their workflow, knowledge about the (ad hoc) organization of the constituent datasets. However, aggregating large numbers of files can be computationally complex with data server systems performing many repetitive operations. As part of the authors work on subsetting data stored on Amazon Web Service (AWS) Simple Storage Service (S3), we developed technology to read portions of otherwise monolithic data files. This enables the formation of virtual shards for user in subsetting data stored in HDF5 (hierarchical data format, version 5) files. This same tool can be used to form aggregations that combine data stored in many HDF5 files when those files are stored on S3. The nature of the virtual sharding and the algorithm that exploits it for subsetting is such that it can also be used for aggregation with the need for many of the repetitive operations required by the per file aggregation techniques. We will present timing information that demonstrates the flexibility of this approach. However, the lessons learned is that while this is a useful result in and of itself, these very same techniques can be applied in other contexts where data are stored in services and on media other than S3. For example, this same technique can be applied to data stored on spinning disk. Pushing the envelope for S3 forced a reexamination of our data access techniques which lead to unexpected positive benefits.

【 预 览 】
附件列表
Files Size Format View
20190033897.pdf 294KB PDF download
  文献评价指标  
  下载次数:29次 浏览次数:19次