期刊论文详细信息
Jurnal RESTI: Rekayasa Sistem dan Teknologi Informasi
The Exploring feature selection techniques on Classification Algorithms for Predicting Type 2 Diabetes at Early Stage
article
Mila Desi Anasanti1  Khairunisa Hilyati2  Annisa Novtariany2 
[1] University College London;Universitas Nusa Mandiri
关键词: Type 2 diabetes;    machine learning;    feature selection;    feature importance;   
DOI  :  10.29207/resti.v6i5.4419
来源: Ikatan Ahli Indormatika Indonesia
PDF
【 摘 要 】

Predicting early Type 2 diabetes (T2D) is critical for improved care and better T2D outcomes. An accurate and efficient T2D prediction relies on unbiased relevant features. In this study, we searched for important features to predict T2D by integrating ML-based models for feature selection and classification from 520 individuals newly diagnosed with diabetes or who will develop it. We used standard machine learning classifications, such as logistic regression (LR), Gaussian naive Bayes (NB), decision tree (DT), random forest (RF), support vector machine (SVM) with linear basis function, and k-nearest neighbors (KNN). We set out to systematically explore the viability of main feature selection representing each different technique, such as a statistical filter method (F-score), an entropy-based filter method (mutual information), an ensemble-based filter method (random forest importance), and a stochastic optimization (simultaneous perturbation feature selection and ranking (SpFSR)). We used a stratified 10-fold cross-validation technique and assessed the performance of discrimination, calibration, and clinical utility. We attained the highest accuracy of 98% using RF with the full set of features (16 features), then used RF as a classifier wrapper to select the important features. We observed a combination of SpFSR and RF as the best model with a P-value above 0.05 (P-value = 0.26), statistically attaining the same accuracy as the full features. The study's findings support the efficiency and usefulness of the suggested method for choosing the most important features of diabetic data: polyuria, gender, polydipsia, age, itching, sudden weight loss, delayed healing, and alopecia.

【 授权许可】

Unknown   

【 预 览 】
附件列表
Files Size Format View
RO202307110004228ZK.pdf 296KB PDF download
  文献评价指标  
  下载次数:8次 浏览次数:0次