| Journal of Big Data | |
| Smoothing target encoding and class center-based firefly algorithm for handling missing values in categorical variable | |
| Research | |
| Heru Nugroho1  Nugraha Priya Utama1  Kridanto Surendro1  | |
| [1] School of Electrical Engineering and Informatics, Institut Teknologi Bandung, Jl. Ganesha, 10, 40132, Bandung, Jawa Barat, Indonesia; | |
| 关键词: Missing data; Encoding; Smoothing; Firefly algorithm; Class center; | |
| DOI : 10.1186/s40537-022-00679-z | |
| received in 2022-05-18, accepted in 2022-12-25, 发布年份 2022 | |
| 来源: Springer | |
PDF
|
|
【 摘 要 】
One of the most common causes of incompleteness is missing data, which occurs when no data value for the variables in observation is stored. An adaptive approach model outperforming other numerical methods in the classification problem was developed using the class center-based Firefly algorithm by incorporating attribute correlations into the imputation process (C3FA). However, this model has not been tested on categorical data, which is essential in the preprocessing stage. Encoding is used to convert text or Boolean values in categorical data into numeric parameters, and the target encoding method is often utilized. This method uses target variable information to encode categorical data and it carries the risk of overfitting and inaccuracy within the infrequent categories. This study aims to use the smoothing target encoding (STE) method to perform the imputation process by combining C3FA and standard deviation (STD) and compare by several imputation methods. The results on the tic tac toe dataset showed that the proposed method (C3FA-STD) produced AUC, CA, F1-Score, precision, and recall values of 0.939, 0.882, 0.881, 0.881, and 0.882, respectively, based on the evaluation using the kNN classifier.
【 授权许可】
CC BY
© The Author(s) 2023
【 预 览 】
| Files | Size | Format | View |
|---|---|---|---|
| RO202305115202548ZK.pdf | 2109KB | ||
| Fig. 21 | 75KB | Image | |
| Fig. 1 | 133KB | Image | |
| 41116_2022_35_Article_IEq545.gif | 1KB | Image | |
| 41116_2022_35_Article_IEq548.gif | 1KB | Image | |
| 41116_2022_35_Article_IEq552.gif | 1KB | Image | |
| 41116_2022_35_Article_IEq558.gif | 1KB | Image | |
| 41116_2022_35_Article_IEq562.gif | 1KB | Image | |
| 41116_2022_35_Article_IEq569.gif | 1KB | Image | |
| Fig. 2 | 100KB | Image | |
| 41116_2022_35_Article_IEq575.gif | 1KB | Image | |
| 41116_2022_35_Article_IEq577.gif | 1KB | Image | |
| 41116_2022_35_Article_IEq579.gif | 1KB | Image | |
| 41116_2022_35_Article_IEq581.gif | 1KB | Image | |
| 41116_2022_35_Article_IEq627.gif | 1KB | Image | |
| Fig. 1 | 137KB | Image | |
| Fig. 48 | 812KB | Image | |
| Fig. 2 | 1985KB | Image | |
| Fig. 6 | 169KB | Image | |
| Fig. 4 | 464KB | Image | |
| Fig. 7 | 386KB | Image | |
| Fig. 4 | 160KB | Image | |
| Fig. 49 | 83KB | Image | |
| Fig. 3 | 3361KB | Image |
【 图 表 】
Fig. 3
Fig. 49
Fig. 4
Fig. 7
Fig. 4
Fig. 6
Fig. 2
Fig. 48
Fig. 1
41116_2022_35_Article_IEq627.gif
41116_2022_35_Article_IEq581.gif
41116_2022_35_Article_IEq579.gif
41116_2022_35_Article_IEq577.gif
41116_2022_35_Article_IEq575.gif
Fig. 2
41116_2022_35_Article_IEq569.gif
41116_2022_35_Article_IEq562.gif
41116_2022_35_Article_IEq558.gif
41116_2022_35_Article_IEq552.gif
41116_2022_35_Article_IEq548.gif
41116_2022_35_Article_IEq545.gif
Fig. 1
Fig. 21
【 参考文献 】
- [1]
- [2]
- [3]
- [4]
- [5]
- [6]
- [7]
- [8]
- [9]
- [10]
- [11]
- [12]
- [13]
- [14]
- [15]
- [16]
- [17]
- [18]
- [19]
- [20]
- [21]
- [22]
- [23]
- [24]
- [25]
- [26]
- [27]
- [28]
- [29]
- [30]
- [31]
- [32]
- [33]
- [34]
- [35]
- [36]
- [37]
- [38]
- [39]
- [40]
- [41]
- [42]
- [43]
- [44]
- [45]
- [46]
- [47]
- [48]
- [49]
- [50]
- [51]
- [52]
- [53]
- [54]
- [55]
- [56]
- [57]
- [58]
- [59]
- [60]
- [61]
- [62]
- [63]
- [64]
PDF