期刊论文详细信息
Journal of Big Data
Determining threshold value on information gain feature selection to increase speed and prediction accuracy of random forest
Kridanto Surendro1  Maria Irmina Prasetiyowati1  Nur Ulfa Maulidevi1 
[1] School of Electrical Engineering and Informatics, Institut Teknologi Bandung, Bandung, Indonesia;
关键词: Threshold;    Standard deviation;    Accuracy;    Time;    Random forest;   
DOI  :  10.1186/s40537-021-00472-4
来源: Springer
PDF
【 摘 要 】

Feature selection is a pre-processing technique used to remove unnecessary characteristics, and speed up the algorithm's work process. A part of the technique is carried out by calculating the information gain value of each dataset characteristic. Also, the determined threshold rate from the information gain value is used in feature selection. However, the threshold value is used freely or through a rate of 0.05. Therefore this study proposed the threshold rate determination using the information gain value’s standard deviation generated by each feature in the dataset. The threshold value determination was tested on 10 original datasets transformed by FFT and IFFT and classified using Random Forest. On processing the transformed dataset with the proposed threshold this study resulted in lower accuracy and longer execution time compared to the same process with Correlation-Base Feature Selection (CBF) and a standard 0.05 threshold method. Similarly, the required accuracy value is lower when using transformed features. The study showed that by processing the original dataset with a standard deviation threshold resulted in better feature selection accuracy of Random Forest classification. Furthermore, by using the transformed feature with the proposed threshold excluding the imaginary numbers leads to a faster average time than the three methods compared.

【 授权许可】

CC BY   

【 预 览 】
附件列表
Files Size Format View
RO202107223805756ZK.pdf 1383KB PDF download
  文献评价指标  
  下载次数:10次 浏览次数:3次