3rd International Seminar On Sciences "Sciences On Precision And Sustainable Agriculture" | |
Synthetic Over Sampling Methods for Handling Class Imbalanced Problems : A Review | |
Santoso, B.^1 ; Wijayanto, H.^1 ; Notodiputro, K.A.^1 ; Sartono, B.^1 | |
Department of Statistics, Faculty of Mathematics and Natural Sciences, Bogor Agricultural University, Indonesia^1 | |
关键词: Class imbalance problems; Evaluation measures; Evaluation methods; Imbalanced data; Imbalanced data problems; Misclassifications; Oversampling methods; Solution methods; | |
Others : https://iopscience.iop.org/article/10.1088/1755-1315/58/1/012031/pdf DOI : 10.1088/1755-1315/58/1/012031 |
|
来源: IOP | |
【 摘 要 】
Class imbalanced commonly found in any real cases. Class imbalanced occur if one of the classes has smaller amount, called minority class, than other class (majority class). The problem of imbalanced data is usually associated with misclassification problem where the minority class tends to be misclassified as compared to the majority class. There are two approaches should be performed to solve imbalanced data problems, those are solution at data level and solution at algorithm level. Over sampling approach is used more frequently than the other data level solution methods. This study gives review of synthethic over sampling methods for handling imbalance data problem. The implementation of different methods will produce different characteristics of the generated synthetic data and the implementation of appropriate methods must be adapted to the problems faced such as the level and pattern of imbalanced data of data available. Results of the review show that there is no absolute methods that are more efficient in dealing with the class imbalance. However, the class imbalance problem depends on complexity of the data, level of class imbalance, size of data and classifier involved. Determination of over sampling strategy will affect the outcome of the over sampling. So it is still open better development oversampling methods for handling the class imbalance. The selection classifier and evaluation measures are important to get the best results. Statistical test approach is needed to assess the theoritical propertis of synthetic data and evaluate missclassification in addition to the evaluation methods that have been used.
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
Synthetic Over Sampling Methods for Handling Class Imbalanced Problems : A Review | 553KB | download |