科技报告详细信息
Efficient binning for bitmap indices on high-cardinality attributes
Rotem, Doron ; Stockinger, Kurt ; Wu, Kesheng
Lawrence Berkeley National Laboratory
关键词: Vectors Bitmap Index Binning Optimization;    Statistics;    Storage;    Dynamic Programming;    99 General And Miscellaneous//Mathematics, Computing, And Information Science;   
DOI  :  10.2172/841113
RP-ID  :  LBNL--56936
RP-ID  :  AC03-76SF00098
RP-ID  :  841113
美国|英语
来源: UNT Digital Library
PDF
【 摘 要 】

Bitmap indexing is a common technique for indexing high-dimensional data in data warehouses and scientific applications. Though efficient for low-cardinality attributes, query processing can be rather costly for high-cardinality attributes due to the large storage requirements for the bitmap indices. Binning is a common technique for reducing storage costs of bitmap indices. This technique partitions the attribute values into a number of ranges, called bins, and uses bitmap vectors to represent bins (attribute ranges) rather than distinct values. Although binning may reduce storage costs, it may increase the access costs of queries that do not fall on exact bin boundaries (edge bins). For this kind of queries the original data values associated with edge bins must be accessed, in order to check them against the query constraints.In this paper we study the problem of finding optimal locations for the bin boundaries in order to minimize these access costs subject to storage constraints. We propose a dynamic programming algorithm for optimal partitioning of attribute values into bins that takes into account query access patterns as well as data distribution statistics. Mathematical analysis and experiments on real life data sets show that the optimal partitioning achieved by this algorithm can lead to a significant improvement in the access costs of bitmap indexing systems for high-cardinality attributes.

【 预 览 】
附件列表
Files Size Format View
841113.pdf 435KB PDF download
  文献评价指标  
  下载次数:34次 浏览次数:57次