科技报告详细信息
Fast Computation of Approximate Biased Histograms on Sliding Windows over Data Streams
Hamid Mousavi ; Carlo Zaniolo
UCLA Henry Samueli School of Engineering and Applied Science
RP-ID  :  130002
学科分类:计算机科学(综合)
美国|英语
来源: UCLA Computer Science Technical Reports Database
PDF
【 摘 要 】
Histograms provide effective synopses of large data sets, and are thus used in a wide variety of applications including query optimization, approximate query answering, distribution fitting, parallel database partitioning, and data mining. However, very fast approximate algorithms are needed to compute accurate histograms on fast arriving data streams, as needed to support online queries within given memory and computing resources. Biased Histograms represent a type of histograms that is essential in applications focusing on certain regions of the data distribution which the histogram must therefore model with greater accuracy. In this paper, after defining approximate biased histograms over sliding windows on data streams, we propose the Bar Splitting Biased Histogram (BSBH) algorithm to construct them efficiently and accurately. We prove that BSBH generates expected ϵ-approximate biased histograms for data streams with stationary distributions, and our experiments show that BSBH also achieves good approximation in the presence of concept shifts, even major ones. Additionally, BSBH employs a new biased sampling technique which outperforms uniform sampling in terms of accuracy, while using about the same amount of time and memory. Therefore, BSBH outperforms previously proposed algorithms for computing biased histograms over the whole data stream, and it is the first algorithm that supports windows.
【 预 览 】
附件列表
Files Size Format View
RO201804090001224LZ 735KB PDF download
  文献评价指标  
  下载次数:12次 浏览次数:23次