期刊论文详细信息
Statistical Analysis and Data Mining
Tracking clusters and anomalies in evolving data streams
article
Sreelekha Guggilam1  Varun Chandola1  Abani Patra1 
[1] Computational Data Science & Engineering, University at Buffalo, State University of New York;Computer Science & Engineering, University at Buffalo, State University of New York;Data Intensive Studies Center, Tufts University
关键词: anomaly detection;    Bayesian nonparametric models;    clustering-based anomaly detection;    evolving stream data;    extreme value theory;   
DOI  :  10.1002/sam.11552
学科分类:社会科学、人文和艺术(综合)
来源: John Wiley & Sons, Inc.
PDF
【 摘 要 】

Data-driven anomaly detection methods typically build a model for the normal behavior of the target system, and score each data instance with respect to this model. A threshold is invariably needed to identify data instances with high (or low) scores as anomalies. This presents a practical limitation on the applicability of such methods, since most methods are sensitive to the choice of the threshold, and it is challenging to set optimal thresholds. The issue is exacerbated in a streaming scenario, where the optimal thresholds vary with time. We present a probabilistic framework to explicitly model the normal and anomalous behaviors and probabilistically reason about the data. An extreme value theory based formulation is proposed to model the anomalous behavior as the extremes of the normal behavior. As a specific instantiation, a joint nonparametric clustering and anomaly detection algorithm (INCAD) is proposed that models the normal behavior as a Dirichlet process mixture model. Results on a variety of datasets, including streaming data, show that the proposed method provides effective and simultaneous clustering and anomaly detection without requiring strong initialization and threshold parameters.

【 授权许可】

Unknown   

【 预 览 】
附件列表
Files Size Format View
RO202302050004636ZK.pdf 2730KB PDF download
  文献评价指标  
  下载次数:12次 浏览次数:2次