IEEE Access | |
Scalable and Multifaceted Search and Its Application for Binary Malware Files | |
Junnyung Hur1  Myungkeun Yoon1  Donghoon Kim2  | |
[1] Department of Computer Science, Kookmin University, Seoul, Seongbuk-gu, Republic of Korea;Kia Corporation, Seoul, Seocho-gu, Republic of Korea; | |
关键词: Elasticsearch; inverted index; jaccard index; malware; MinHash; | |
DOI : 10.1109/ACCESS.2021.3102157 | |
来源: DOAJ |
【 摘 要 】
Malicious binary files are a serious threat to industrial information systems. Because of their large number, an automatic assistant tool becomes essential for analysis, and finding similar files would be a great help. In this paper, we present a fast, scalable, and multifaceted search scheme to find similar binary malware files. We use a content-defined chunking algorithm to convert a file into a feature set for the first time. The proposed scheme uses MinHash to reduce any feature set of any file to a fixed size, which significantly improves search accuracy, processing speed, and space utilization. We theoretically prove that the new scheme returns similar files in jaccard index order. Through implementation and experiments with 12 million malicious files, we confirm that the search speed is increased by 600%, space is reduced by 90%, and the accuracy is increased by 400% at least, compared with the state-of-the-art of Elasticsearch.
【 授权许可】
Unknown