期刊论文详细信息
IEEE Access
PSU: Particle Stacking Undersampling Method for Highly Imbalanced Big Data
Dong-Joon Lim1  Yong-Seok Jeon1 
[1] Department of Industrial Engineering, Sungkyunkwan University, Suwon, South Korea;
关键词: Data mining;    imbalanced data;    undersampling;    big data;    support vector machines;   
DOI  :  10.1109/ACCESS.2020.3009753
来源: DOAJ
【 摘 要 】

Imbalanced classes are a common problem in machine learning, and the computational costs required for proper resampling increases with the data size. In this study, a simple and effective undersampling method, named particle stacking undersampling (PSU) was proposed. Compared with other competing undersampling methods, PSU can significantly reduce the computational costs, while minimizing information loss to prevent a prediction bias. The performance benchmark applied on 55 binary classification problems indicated that the proposed method not only achieved an enhanced classification performance over other well-known undersampling methods (random undersampling, NearMiss-1, NearMiss-2, cluster centroid, edited nearest neighbor, condensed nearest neighbor, and Tomek Links) but also provided a computational simplicity that can be scalable to large data. Moreover, an experiment verified that two propositions forming the basis of the PSU algorithm can also be applied to other undersampling methods to achieve methodological improvements.

【 授权许可】

Unknown   

  文献评价指标  
  下载次数:0次 浏览次数:0次