期刊论文详细信息
BMC Bioinformatics
CURE-SMOTE algorithm and hybrid algorithm for feature selection and parameter optimization based on random forests
Research Article
Li Ma1  Suohai Fan1 
[1] School of Information Science and Technology, Jinan University, 510632, Guangzhou, China;
关键词: Random forests;    Imbalance data;    Intelligence algorithm;    Feature selection;    Parameter optimization;   
DOI  :  10.1186/s12859-017-1578-z
 received in 2016-08-25, accepted in 2017-03-03,  发布年份 2017
来源: Springer
PDF
【 摘 要 】

BackgroundThe random forests algorithm is a type of classifier with prominent universality, a wide application range, and robustness for avoiding overfitting. But there are still some drawbacks to random forests. Therefore, to improve the performance of random forests, this paper seeks to improve imbalanced data processing, feature selection and parameter optimization.ResultsWe propose the CURE-SMOTE algorithm for the imbalanced data classification problem. Experiments on imbalanced UCI data reveal that the combination of Clustering Using Representatives (CURE) enhances the original synthetic minority oversampling technique (SMOTE) algorithms effectively compared with the classification results on the original data using random sampling, Borderline-SMOTE1, safe-level SMOTE, C-SMOTE, and k-means-SMOTE. Additionally, the hybrid RF (random forests) algorithm has been proposed for feature selection and parameter optimization, which uses the minimum out of bag (OOB) data error as its objective function. Simulation results on binary and higher-dimensional data indicate that the proposed hybrid RF algorithms, hybrid genetic-random forests algorithm, hybrid particle swarm-random forests algorithm and hybrid fish swarm-random forests algorithm can achieve the minimum OOB error and show the best generalization ability.ConclusionThe training set produced from the proposed CURE-SMOTE algorithm is closer to the original data distribution because it contains minimal noise. Thus, better classification results are produced from this feasible and effective algorithm. Moreover, the hybrid algorithm's F-value, G-mean, AUC and OOB scores demonstrate that they surpass the performance of the original RF algorithm. Hence, this hybrid algorithm provides a new way to perform feature selection and parameter optimization.

【 授权许可】

CC BY   
© The Author(s). 2017

【 预 览 】
附件列表
Files Size Format View
RO202311097895704ZK.pdf 2675KB PDF download
12864_2015_1994_Article_IEq16.gif 1KB Image download
12864_2015_2198_Article_IEq33.gif 1KB Image download
12864_2016_2821_Article_IEq30.gif 1KB Image download
12864_2016_3098_Article_IEq45.gif 1KB Image download
12864_2017_4020_Article_IEq39.gif 1KB Image download
12864_2017_4130_Article_IEq11.gif 1KB Image download
12888_2016_848_Article_IEq1.gif 1KB Image download
12864_2017_4020_Article_IEq42.gif 1KB Image download
12864_2016_2821_Article_IEq35.gif 1KB Image download
12864_2017_4017_Article_IEq2.gif 1KB Image download
12864_2016_3477_Article_IEq2.gif 1KB Image download
12864_2017_4030_Article_IEq3.gif 1KB Image download
12864_2016_3477_Article_IEq3.gif 1KB Image download
12903_2017_424_Article_IEq1.gif 1KB Image download
12864_2017_4130_Article_IEq18.gif 1KB Image download
【 图 表 】

12864_2017_4130_Article_IEq18.gif

12903_2017_424_Article_IEq1.gif

12864_2016_3477_Article_IEq3.gif

12864_2017_4030_Article_IEq3.gif

12864_2016_3477_Article_IEq2.gif

12864_2017_4017_Article_IEq2.gif

12864_2016_2821_Article_IEq35.gif

12864_2017_4020_Article_IEq42.gif

12888_2016_848_Article_IEq1.gif

12864_2017_4130_Article_IEq11.gif

12864_2017_4020_Article_IEq39.gif

12864_2016_3098_Article_IEq45.gif

12864_2016_2821_Article_IEq30.gif

12864_2015_2198_Article_IEq33.gif

12864_2015_1994_Article_IEq16.gif

【 参考文献 】
  • [1]
  • [2]
  • [3]
  • [4]
  • [5]
  • [6]
  • [7]
  • [8]
  • [9]
  • [10]
  • [11]
  • [12]
  • [13]
  • [14]
  • [15]
  • [16]
  • [17]
  • [18]
  • [19]
  • [20]
  • [21]
  • [22]
  • [23]
  • [24]
  • [25]
  • [26]
  • [27]
  • [28]
  • [29]
  • [30]
  • [31]
  • [32]
  • [33]
  • [34]
  • [35]
  • [36]
  • [37]
  • [38]
  • [39]
  • [40]
  • [41]
  • [42]
  • [43]
  • [44]
  • [45]
  • [46]
  • [47]
  • [48]
  • [49]
  • [50]
  • [51]
  • [52]
  • [53]
  • [54]
  • [55]
  • [56]
  • [57]
  • [58]
  • [59]
  • [60]
  • [61]
  • [62]
  • [63]
  • [64]
  • [65]
  • [66]
  • [67]
  • [68]
  • [69]
  • [70]
  • [71]
  • [72]
  • [73]
  • [74]
  • [75]
  • [76]
  • [77]
  • [78]
  • [79]
  文献评价指标  
  下载次数:5次 浏览次数:0次