期刊论文详细信息
BMC Medical Informatics and Decision Making
An improved survivability prognosis of breast cancer by using sampling and feature selection technique to solve imbalanced patient classification data
Research Article
Kung-Jeng Wang1  Bunjira Makond2  Kung-Min Wang3 
[1] Department of Industrial Management, National Taiwan University of Science and Technology, 106, Taipei, Taiwan;Department of Industrial Management, National Taiwan University of Science and Technology, 106, Taipei, Taiwan;Faculty of Commerce and Management, Prince of Songkla University, Trang, Thailand;Department of Surgery, Shin-Kong Wu Ho-Su Memorial Hospital, Taipei, Taiwan;
关键词: Breast cancer;    Decision tree;    Logistic regression;    Imbalanced data;    Synthetic minority over-sampling;    Cost-sensitive classifier technique;   
DOI  :  10.1186/1472-6947-13-124
 received in 2013-06-01, accepted in 2013-10-28,  发布年份 2013
来源: Springer
PDF
【 摘 要 】

BackgroundBreast cancer is one of the most critical cancers and is a major cause of cancer death among women. It is essential to know the survivability of the patients in order to ease the decision making process regarding medical treatment and financial preparation. Recently, the breast cancer data sets have been imbalanced (i.e., the number of survival patients outnumbers the number of non-survival patients) whereas the standard classifiers are not applicable for the imbalanced data sets. The methods to improve survivability prognosis of breast cancer need for study.MethodsTwo well-known five-year prognosis models/classifiers [i.e., logistic regression (LR) and decision tree (DT)] are constructed by combining synthetic minority over-sampling technique (SMOTE) ,cost-sensitive classifier technique (CSC), under-sampling, bagging, and boosting. The feature selection method is used to select relevant variables, while the pruning technique is applied to obtain low information-burden models. These methods are applied on data obtained from the Surveillance, Epidemiology, and End Results database. The improvements of survivability prognosis of breast cancer are investigated based on the experimental results.ResultsExperimental results confirm that the DT and LR models combined with SMOTE, CSC, and under-sampling generate higher predictive performance consecutively than the original ones. Most of the time, DT and LR models combined with SMOTE and CSC use less informative burden/features when a feature selection method and a pruning technique are applied.ConclusionsLR is found to have better statistical power than DT in predicting five-year survivability. CSC is superior to SMOTE, under-sampling, bagging, and boosting to improve the prognostic performance of DT and LR.

【 授权许可】

CC BY   
© Wang et al.; licensee BioMed Central Ltd. 2013

【 预 览 】
附件列表
Files Size Format View
RO202311095417117ZK.pdf 447KB PDF download
【 参考文献 】
  • [1]
  • [2]
  • [3]
  • [4]
  • [5]
  • [6]
  • [7]
  • [8]
  • [9]
  • [10]
  • [11]
  • [12]
  • [13]
  • [14]
  • [15]
  • [16]
  • [17]
  • [18]
  • [19]
  • [20]
  • [21]
  • [22]
  • [23]
  • [24]
  • [25]
  • [26]
  • [27]
  • [28]
  • [29]
  • [30]
  • [31]
  • [32]
  • [33]
  • [34]
  • [35]
  • [36]
  • [37]
  • [38]
  • [39]
  • [40]
  • [41]
  • [42]
  • [43]
  • [44]
  • [45]
  • [46]
  文献评价指标  
  下载次数:3次 浏览次数:0次