期刊论文详细信息
NEUROCOMPUTING 卷:194
CVS: Fast cardinality estimation for large-scale data streams over sliding windows
Article
Shan, Jingsong1,2  Luo, Jianxin1  Ni, Guiqiang1  Wu, Zhaofeng1  Duan, Weiwei1 
[1] PLA Univ Sci & Technol, Coll Command Informat Syst, Nanjing, Jiangsu, Peoples R China
[2] Huaiyin Inst Technol, Fac Comp Engn, Huaian, Peoples R China
关键词: Data stream;    Sliding window;    Counter vector sketch;    Cardinality;    Random update;   
DOI  :  10.1016/j.neucom.2016.01.072
来源: Elsevier
PDF
【 摘 要 】

Estimating the cardinality of data streams over a sliding window is an important problem in many applications, such as network traffic monitoring, web access log analysis and database. The problem becomes more difficult in large-scale data streams when time and space complexity is taken into account. In this paper, we present a novel randomized data structure to address the problem. The significant contributions are as follows: (1) A space-efficient counter vector sketch (CVS) are proposed, which extends the well-known bitmap sketch to sliding window settings. (2) Based on the CVS, a random update mechanism is introduced, whereby a small fixed number of entries are randomly chosen from CVS in a step and then updated. This means that the update procedure just costs constant time. (3) Furthermore, estimating cardinality by CVS just needs one-pass scan of the data. (4) Finally, a theoretical analysis is given to show the accuracy of CVS-based estimators. Our comprehensive experiments confirm that the CVS-based schema attains high accuracy, and demonstate its time efficiency in comparison with the Timestamp Vector (TSV) and the auxiliary indexing method. (C) 2016 Elsevier B.V. All rights reserved.

【 授权许可】

Free   

【 预 览 】
附件列表
Files Size Format View
10_1016_j_neucom_2016_01_072.pdf 2266KB PDF download
  文献评价指标  
  下载次数:0次 浏览次数:0次