会议论文详细信息
3rd International Conference on Automation, Control and Robotics Engineering
Bagging of Xgboost Classifiers with Random Under-sampling and Tomek Link for Noisy Label-imbalanced Data
工业技术;计算机科学;无线电电子学
Ruisen, Luo^1 ; Songyi, Dian^1 ; Chen, Wang^1,2 ; Peng, Cheng^1 ; Zuodong, Tang^1 ; Yanmei, Yu^1 ; Shixiong, Wang^3
College of Electrical Engineering and Information Technology, Sichuan University, 24 South Section 1, One Ring Road, Chengdu
610065, China^1
Department of Computer Science, University College London, Gower Street, London
WC1E 6BT, United Kingdom^2
School of Electronics and Information, Northwestern Polytechnical University, 1 Dongxiang Road, Chang'an District, Xi'an
710129, China^3
关键词: Gradient boosting;    Imbalanced data;    Misleading informations;    Model misspecification;    Parameter-tuning;    Random sampling;    Random under samplings;    Under-sampling;   
Others  :  https://iopscience.iop.org/article/10.1088/1757-899X/428/1/012004/pdf
DOI  :  10.1088/1757-899X/428/1/012004
来源: IOP
PDF
【 摘 要 】

Fitting label-imbalanced data with high level of noise is one of the major challenges in learning-based intelligent system design. In this paper, for the two-class problem, we propose a bagging-based algorithm with Xgboost classifier (Gradient Boosting Machine) and under-sampling approaches to overcome the challenge. To avoid model misspecification caused by imbalanced data, random sampling with replacement is employed to obtain several balanced training sets; and to mitigate the problem of misleading information produced by noise, Tomek Link method is introduced to eliminate the cross-class overlapped instances, which are the primal sources of noise. And to obtain robust individual learners, we utilize Xgboost, a novel Gradient Boosting Machine-based classifier with convenient parameter tuning interface, to fit each component of the bagging ensemble. The performance of the proposed method is tested with Mandarin radio records (MFCC features) with the task of keywords recognition, and experimental results show that the new method could outperform single Xgboost classifier, verified the rationality and effectiveness of the bagging scheme. The method proposed in the paper could offer a novel solution to the challenge of noisy imbalanced data classification, and the implementation of Xgboost in this area could also serve as an innovative work.

【 预 览 】
附件列表
Files Size Format View
Bagging of Xgboost Classifiers with Random Under-sampling and Tomek Link for Noisy Label-imbalanced Data 836KB PDF download
  文献评价指标  
  下载次数:11次 浏览次数:45次