期刊论文详细信息
Chem-Bio Informatics Journal
A Novel Over-Sampling Method and its Application to Cancer Classification from Gene Expression Data
Lan Anh T. Nguyen2  Xuan Tho Dang2  Kenji Satou3  Thammakorn Saethang2  Mamoru Kubo3  Yoichi Yamada3  Duong Hung Bui1  Tu Kien T. Le2  Osamu Hirose3  Vu Anh Tran2 
[1] Faculty of Information Technology, Vietnam Trade Union University;Graduate School of Natural Science and Technology, Kanazawa University;Institute of Science and Engineering, Kanazawa University
关键词: Imbalanced dataset;    クラス不均衡;    SMOTE;    Over-sampling;    オーバーサンプリング;    Cancer classification;    がん分類;   
DOI  :  10.1273/cbij.13.19
学科分类:生物化学/生物物理
来源: Chem-Bio Informatics Society
PDF
【 摘 要 】

References(26)Cited-By(1)One of the most critical and frequent problems in biomedical data classification is imbalanced class distribution, where samples from the majority class significantly outnumber the minority class. SMOTE is a well-known general over-sampling method used to address this problem; however, in some cases it cannot improve or even reduces classification performance. To address these issues, we have developed a novel minority over-sampling method named safe-SMOTE. Experimental results from two gene expression datasets for cancer classification (i.e., colon-cancer and leukemia) and six imbalanced benchmark datasets from the UCI Machine Learning Repository showed that our method achieved better sensitivity and G-mean values than both the control method (i.e., no over-sampling) and SMOTE. For example, in the colon-cancer dataset, although the sensitivity and specificity achieved by SMOTE (81.36% and 88.63%) were lower than for the control method (81.59% and 89.50%), safe-SMOTE in contrast had these values increase (81.82% and 90.50%). Similarly, the G-mean value of the control (85.45%) decreased to 84.91% when SMOTE was employed, but increased to 86.04% when using safe-SMOTE. In the leukemia dataset, SMOTE was able to improve the sensitivity and G-mean values with respect to the control; however, safe-SMOTE achieved noticeable, even greater improvements for both of these criteria.

【 授权许可】

Unknown   

【 预 览 】
附件列表
Files Size Format View
RO201911300564192ZK.pdf 212KB PDF download
  文献评价指标  
  下载次数:14次 浏览次数:26次