期刊论文详细信息
NEUROCOMPUTING 卷:150
Neighbourhood sampling in bagging for imbalanced data
Article
Blaszczynski, Jerzy1  Stefanowski, Jerzy1 
[1] Poznan Univ Tech, Inst Comp Sci, PL-60965 Poznan, Poland
关键词: Class imbalance;    Ensemble classifiers;    Bagging;   
DOI  :  10.1016/j.neucom.2014.07.064
来源: Elsevier
PDF
【 摘 要 】

Various approaches to extend bagging ensembles for class imbalanced data are considered. First, we review known extensions and compare them in a comprehensive experimental study. The results show that integrating bagging with under-sampling is more powerful than over-sampling. They also allow to distinguish Roughly Balanced Bagging as the most accurate extension. Then, we point out that complex and difficult distribution of the minority class can be handled by analyzing the content of a neighbourhood of examples. In our study we show that taking into account such local characteristics of the minority class distribution can be useful both for analyzing performance of ensembles with respect to data difficulty factors and for proposing new generalizations of bagging. We demonstrate it by proposing Neighbourhood Balanced Bagging, where sampling probabilities of examples are modified according to the class distribution in their neighbourhood. Two of its versions are considered: the first one keeping a larger size of bootstrap samples by hybrid over-sampling and the other reducing this size with stronger under-sampling. Experiments prove that the first version is significantly better than existing over-sampling bagging extensions while the other version is competitive to Roughly Balanced Bagging. Finally, we demonstrate that detecting types of minority examples depending on their neighbourhood may help explain why some ensembles work better for imbalanced data than others. (C) 2014 Elsevier B.V. All rights reserved.

【 授权许可】

Free   

【 预 览 】
附件列表
Files Size Format View
10_1016_j_neucom_2014_07_064.pdf 1440KB PDF download
  文献评价指标  
  下载次数:0次 浏览次数:0次