期刊论文详细信息
Journal of Big Data
Smoothing target encoding and class center-based firefly algorithm for handling missing values in categorical variable
Research
Heru Nugroho1  Nugraha Priya Utama1  Kridanto Surendro1 
[1] School of Electrical Engineering and Informatics, Institut Teknologi Bandung, Jl. Ganesha, 10, 40132, Bandung, Jawa Barat, Indonesia;
关键词: Missing data;    Encoding;    Smoothing;    Firefly algorithm;    Class center;   
DOI  :  10.1186/s40537-022-00679-z
 received in 2022-05-18, accepted in 2022-12-25,  发布年份 2022
来源: Springer
PDF
【 摘 要 】

One of the most common causes of incompleteness is missing data, which occurs when no data value for the variables in observation is stored. An adaptive approach model outperforming other numerical methods in the classification problem was developed using the class center-based Firefly algorithm by incorporating attribute correlations into the imputation process (C3FA). However, this model has not been tested on categorical data, which is essential in the preprocessing stage. Encoding is used to convert text or Boolean values in categorical data into numeric parameters, and the target encoding method is often utilized. This method uses target variable information to encode categorical data and it carries the risk of overfitting and inaccuracy within the infrequent categories. This study aims to use the smoothing target encoding (STE) method to perform the imputation process by combining C3FA and standard deviation (STD) and compare by several imputation methods. The results on the tic tac toe dataset showed that the proposed method (C3FA-STD) produced AUC, CA, F1-Score, precision, and recall values of 0.939, 0.882, 0.881, 0.881, and 0.882, respectively, based on the evaluation using the kNN classifier.

【 授权许可】

CC BY   
© The Author(s) 2023

【 预 览 】
附件列表
Files Size Format View
RO202305115202548ZK.pdf 2109KB PDF download
Fig. 21 75KB Image download
Fig. 1 133KB Image download
41116_2022_35_Article_IEq545.gif 1KB Image download
41116_2022_35_Article_IEq548.gif 1KB Image download
41116_2022_35_Article_IEq552.gif 1KB Image download
41116_2022_35_Article_IEq558.gif 1KB Image download
41116_2022_35_Article_IEq562.gif 1KB Image download
41116_2022_35_Article_IEq569.gif 1KB Image download
Fig. 2 100KB Image download
41116_2022_35_Article_IEq575.gif 1KB Image download
41116_2022_35_Article_IEq577.gif 1KB Image download
41116_2022_35_Article_IEq579.gif 1KB Image download
41116_2022_35_Article_IEq581.gif 1KB Image download
41116_2022_35_Article_IEq627.gif 1KB Image download
Fig. 1 137KB Image download
Fig. 48 812KB Image download
Fig. 2 1985KB Image download
Fig. 6 169KB Image download
Fig. 4 464KB Image download
Fig. 7 386KB Image download
Fig. 4 160KB Image download
Fig. 49 83KB Image download
Fig. 3 3361KB Image download
【 图 表 】

Fig. 3

Fig. 49

Fig. 4

Fig. 7

Fig. 4

Fig. 6

Fig. 2

Fig. 48

Fig. 1

41116_2022_35_Article_IEq627.gif

41116_2022_35_Article_IEq581.gif

41116_2022_35_Article_IEq579.gif

41116_2022_35_Article_IEq577.gif

41116_2022_35_Article_IEq575.gif

Fig. 2

41116_2022_35_Article_IEq569.gif

41116_2022_35_Article_IEq562.gif

41116_2022_35_Article_IEq558.gif

41116_2022_35_Article_IEq552.gif

41116_2022_35_Article_IEq548.gif

41116_2022_35_Article_IEq545.gif

Fig. 1

Fig. 21

【 参考文献 】
  • [1]
  • [2]
  • [3]
  • [4]
  • [5]
  • [6]
  • [7]
  • [8]
  • [9]
  • [10]
  • [11]
  • [12]
  • [13]
  • [14]
  • [15]
  • [16]
  • [17]
  • [18]
  • [19]
  • [20]
  • [21]
  • [22]
  • [23]
  • [24]
  • [25]
  • [26]
  • [27]
  • [28]
  • [29]
  • [30]
  • [31]
  • [32]
  • [33]
  • [34]
  • [35]
  • [36]
  • [37]
  • [38]
  • [39]
  • [40]
  • [41]
  • [42]
  • [43]
  • [44]
  • [45]
  • [46]
  • [47]
  • [48]
  • [49]
  • [50]
  • [51]
  • [52]
  • [53]
  • [54]
  • [55]
  • [56]
  • [57]
  • [58]
  • [59]
  • [60]
  • [61]
  • [62]
  • [63]
  • [64]
  文献评价指标  
  下载次数:15次 浏览次数:1次