科技报告详细信息
Optimizing Regular Expression Clustering for Massive Pattern Search
Nikolay Laptev ; Hamid Mousavi ; Alexander Shkapsky ; Carlo Zaniolo
UCLA Henry Samueli School of Engineering and Applied Science
RP-ID  :  120005
学科分类:计算机科学(综合)
美国|英语
来源: UCLA Computer Science Technical Reports Database
PDF
【 摘 要 】
Optimizing the search for a large sets of patterns, due to their prevalence, isbecoming critical in a variety ofapplications including financial services, healthcare monitoring, RFID-based inventory management, publish subscribe and network intrusion detection systems (NIDS). For example in NIDS, to identify particular packets in a packet stream, modern network devices need to perform deep packet inspection at high rates given a limited memory and alarge number of signatures. It is often proposed to group togetherregular expressions (REs) denoting similar patterns into a DFA or an NFA todecreasethememory usage and increase the average throughput.In this paper, we propose a framework comprised of three novel distance metrics used to cluster REs and an execution technique that uses the information aggregated by the distance metrics to optimize the number of memory reads during execution, thereby increasing the overall throughput. Our distance metrics are unique in that each metric is specialized for a particular type of patterns. The first two metrics optimize for compactness of REs and specialize in regular expressions that have long, contiguous similarity segments. Our third metric optimizes for redundancy and specializes in REs that have random segments of similar. We show how compression, prefetching and memory access optimizations are made possible when the notion of redundancy is leveraged. We validate our approach by implementing it in a modern NIDS Snort, which matches potentially thousands of patterns against an incoming network stream. We observe a five-fold speed improvement over the unmodified version of Snort.
【 预 览 】
附件列表
Files Size Format View
RO201804090001201LZ 228KB PDF download
  文献评价指标  
  下载次数:13次 浏览次数:24次