BMC Research Notes | |
Disease prediction via Bayesian hyperparameter optimization and ensemble learning | |
Liyuan Gao1  Yongmei Ding1  | |
[1] College of Science, Wuhan University of Science and Technology; | |
关键词: Hyperparameter optimization; Feature selection; Ensemble learning; Gain; | |
DOI : 10.1186/s13104-020-05050-0 | |
来源: DOAJ |
【 摘 要 】
Abstract Objective Early disease screening and diagnosis are important for improving patient survival. Thus, identifying early predictive features of disease is necessary. This paper presents a comprehensive comparative analysis of different Machine Learning (ML) systems and reports the standard deviation of the results obtained through sampling with replacement. The research emphasises on: (a) to analyze and compare ML strategies used to predict Breast Cancer (BC) and Cardiovascular Disease (CVD) and (b) to use feature importance ranking to identify early high-risk features. Results The Bayesian hyperparameter optimization method was more stable than the grid search and random search methods. In a BC diagnosis dataset, the Extreme Gradient Boosting (XGBoost) model had an accuracy of 94.74% and a sensitivity of 93.69%. The mean value of the cell nucleus in the Fine Needle Puncture (FNA) digital image of breast lump was identified as the most important predictive feature for BC. In a CVD dataset, the XGBoost model had an accuracy of 73.50% and a sensitivity of 69.54%. Systolic blood pressure was identified as the most important feature for CVD prediction.
【 授权许可】
Unknown