期刊论文详细信息
BMC Medical Informatics and Decision Making
Exploratory study on classification of diabetes mellitus through a combined Random Forest Classifier
Hao Ren1  Lixia Qiu1  Meichen Li1  Dichen Quan1  Mengmeng Zhai1  Xuchun Wang1  Zeping Ren2  Limin Chen3 
[1] Department of Health Statistics, School of Public Health, Shanxi Medical University;Shanxi Centre for Disease Control and Prevention;Shanxi Provincial People’s Hospital;
关键词: Diabetes mellitus;    Classification;    Random Forest Classifier;    Imbalanced data;    Indicators;   
DOI  :  10.1186/s12911-021-01471-4
来源: DOAJ
【 摘 要 】

Abstract Background Diabetes Mellitus (DM) has become the third chronic non-communicable disease that hits patients after tumors, cardiovascular and cerebrovascular diseases, and has become one of the major public health problems in the world. Therefore, it is of great importance to identify individuals at high risk for DM in order to establish prevention strategies for DM. Methods Aiming at the problem of high-dimensional feature space and high feature redundancy of medical data, as well as the problem of data imbalance often faced. This study explored different supervised classifiers, combined with SVM-SMOTE and two feature dimensionality reduction methods (Logistic stepwise regression and LAASO) to classify the diabetes survey sample data with unbalanced categories and complex related factors. Analysis and discussion of the classification results of 4 supervised classifiers based on 4 data processing methods. Five indicators including Accuracy, Precision, Recall, F1-Score and AUC are selected as the key indicators to evaluate the performance of the classification model. Results According to the result, Random Forest Classifier combining SVM-SMOTE resampling technology and LASSO feature screening method (Accuracy = 0.890, Precision = 0.869, Recall = 0.919, F1-Score = 0.893, AUC = 0.948) proved the best way to tell those at high risk of DM. Besides, the combined algorithm helps enhance the classification performance for prediction of high-risk people of DM. Also, age, region, heart rate, hypertension, hyperlipidemia and BMI are the top six most critical characteristic variables affecting diabetes. Conclusions The Random Forest Classifier combining with SVM-SMOTE and LASSO feature reduction method perform best in identifying high-risk people of DM from individuals. And the combined method proposed in the study would be a good tool for early screening of DM.

【 授权许可】

Unknown   

  文献评价指标  
  下载次数:0次 浏览次数:8次