科技报告详细信息
Chunking of Large Multidimensional Arrays
Rotem, Doron ; Otoo, Ekow J. ; Seshadri, Sridhar
Lawrence Berkeley National Laboratory
关键词: Efficiency;    Processing;    Programming;    Mathematical Models;    Multi-Dimensional Arrays Algorithm Array Chunking;   
DOI  :  10.2172/927033
RP-ID  :  LBNL--63230
RP-ID  :  DE-AC02-05CH11231
RP-ID  :  927033
美国|英语
来源: UNT Digital Library
PDF
【 摘 要 】

Data intensive scientific computations as well on-lineanalytical processing applications as are done on very large datasetsthat are modeled as k-dimensional arrays. The storage organization ofsuch arrays on disks is done by partitioning the large global array intofixed size hyper-rectangular sub-arrays called chunks or tiles that formthe units of data transfer between disk and memory. Typical queriesinvolve the retrieval of sub-arrays in a manner that accesses all chunksthat overlap the query results. An important metric of the storageefficiency is the expected number of chunks retrieved over all suchqueries. The question that immediately arises is "what shapes of arraychunks give the minimum expected number of chunks over a query workload?"In this paper we develop two probabilistic mathematical models of theproblem and provide exact solutions using steepest descent and geometricprogramming methods. Experimental results, using synthetic workloads onreal life data sets, show that our chunking is much more efficient thanthe existing approximate solutions.

【 预 览 】
附件列表
Files Size Format View
927033.pdf 203KB PDF download
  文献评价指标  
  下载次数:11次 浏览次数:56次