Electronics | |
A Method for Class-Imbalance Learning in Android Malware Detection | |
Baolei Mao1  Xu Jiang2  Jun Guan2  | |
[1] Cooperative Innovation Center of Internet Healthcare, Zhengzhou University, Zhengzhou 450000, China;School of Automation, Northwestern Polytechnical University, Xi’an 710072, China; | |
关键词: random forest; SMOTE; android malware; imbalance data; clustering; under-sampling; | |
DOI : 10.3390/electronics10243124 | |
来源: DOAJ |
【 摘 要 】
More and more Android application developers are adopting many different methods against reverse engineering, such as adding a shell, resulting in certain features that cannot be obtained through decompilation, which causes a serious sample imbalance in Android malware detection based on machine learning. Hence, the researchers have focused on how to solve class-imbalance to improve the performance of Android malware detection. However, the disadvantages of the existing class-imbalance learning are mainly the loss of valuable samples and the computational cost. In this paper, we propose a method of Class-Imbalance Learning (CIL), which first selects representative features, uses the clustering K-Means algorithm and under-sampling to retain the important samples of the majority class while reducing the number of samples of the majority class. After that, we use the Synthetic Minority Over-Sampling Technique (SMOTE) algorithm to generate minority class samples for data balance, and finally use the Random Forest (RF) algorithm to build a malware detection model. The result of experiments indicates that CIL effectively improves the performance of Android malware detection based on machine learning, especially for class imbalance. Compared with existing class-imbalance learning methods, CIL is also effective for the Machine Learning Repository from the University of California, Irvine (UCI) and has better performance in some data sets.
【 授权许可】
Unknown