Applied Sciences | |
Comparative Evaluation of NLP-Based Approaches for Linking CAPEC Attack Patterns from CVE Vulnerability Information | |
Hironori Washizaki1  Kenta Kanakogi1  Yoshiaki Fukazawa1  Atsuo Hazeyama2  Takehisa Kato3  Hideyuki Kanuka3  Shinpei Ogata4  Takao Okubo5  Nobukazu Yoshioka6  | |
[1] Department of Computer Science and Engineering, Waseda University, Shinjuku-ku, Tokyo 169-8555, Japan;Department of Information Science, Tokyo Gakugei University, Koganei-shi 184-8501, Japan;Hitachi, Ltd., Chiyoda-ku, Tokyo 100-8280, Japan;Institute of Engineering, Academic Assembly, Shinshu University, Nagano 380-8553, Japan;Institute of Information Security, Yokohama 221-0835, Japan;Research Institute for Science and Engineering, Waseda University, Shinjuku-ku, Tokyo 169-8555, Japan; | |
关键词: cybersecurity database; CVE; CAPEC; natural language processing; sentence embeddings; TF-IDF; | |
DOI : 10.3390/app12073400 | |
来源: DOAJ |
【 摘 要 】
Vulnerability and attack information must be collected to assess the severity of vulnerabilities and prioritize countermeasures against cyberattacks quickly and accurately. Common Vulnerabilities and Exposures is a dictionary that lists vulnerabilities and incidents, while Common Attack Pattern Enumeration and Classification is a dictionary of attack patterns. Direct identification of common attack pattern enumeration and classification from common vulnerabilities and exposures is difficult, as they are not always directly linked. Here, an approach to directly find common links between these dictionaries is proposed. Then, several patterns, which are combinations of similarity measures and popular algorithms such as term frequency–inverse document frequency, universal sentence encoder, and sentence BERT, are evaluated experimentally using the proposed approach. Specifically, two metrics, recall and mean reciprocal rank, are used to assess the traceability of the common attack pattern enumeration and classification identifiers associated with 61 identifiers for common vulnerabilities and exposures. The experiment confirms that the term frequency–inverse document frequency algorithm provides the best overall performance.
【 授权许可】
Unknown