期刊论文详细信息
BMC Medical Informatics and Decision Making 卷:23
Cardiovascular disease incidence prediction by machine learning and statistical techniques: a 16-year cohort study from eastern Mediterranean region
Research
Awat Feizi1  Masoumeh Sadeghi2  Kamran Mehrabani-Zeinabad3  Hamidreza Roohafza3  Nizal Sarrafzadegan4  Mohammad Talaei5 
[1] Biostatistics and Epidemiology Department, School of Health, Isfahan University of Medical Sciences, Isfahan, Iran;Cardiac Rehabilitation Research Center, Cardiovascular Research Institute, Isfahan University of Medical Sciences, Isfahan, Iran;
[2] Cardiac Rehabilitation Research Center, Cardiovascular Research Institute, Isfahan University of Medical Sciences, Isfahan, Iran;
[3] Cardiovascular Research Center, Cardiovascular Research Institute, Isfahan University of Medical Sciences, Isfahan, Iran;
[4] Cardiovascular Research Center, Cardiovascular Research Institute, Isfahan University of Medical Sciences, Isfahan, Iran;School of Population and Public Health, Faculty of Medicine, University of British Columbia, Vancouver, Canada;
[5] Wolfson Institute of Population Health, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London, UK;
关键词: Cardiovascular;    Machine learning;    Statistical models;    Cohort study;    Eastern Mediterranean region;    Feature selection;    Missing values;   
DOI  :  10.1186/s12911-023-02169-5
 received in 2022-09-19, accepted in 2023-04-04,  发布年份 2023
来源: Springer
PDF
【 摘 要 】

BackgroundCardiovascular diseases (CVD) are the predominant cause of early death worldwide. Identification of people with a high risk of being affected by CVD is consequential in CVD prevention. This study adopts Machine Learning (ML) and statistical techniques to develop classification models for predicting the future occurrence of CVD events in a large sample of Iranians.MethodsWe used multiple prediction models and ML techniques with different abilities to analyze the large dataset of 5432 healthy people at the beginning of entrance into the Isfahan Cohort Study (ICS) (1990–2017). Bayesian additive regression trees enhanced with “missingness incorporated in attributes” (BARTm) was run on the dataset with 515 variables (336 variables without and the remaining with up to 90% missing values). In the other used classification algorithms, variables with more than 10% missing values were excluded, and MissForest imputes the missing values of the remaining 49 variables. We used Recursive Feature Elimination (RFE) to select the most contributing variables. Random oversampling technique, recommended cut-point by precision-recall curve, and relevant evaluation metrics were used for handling unbalancing in the binary response variable.ResultsThis study revealed that age, systolic blood pressure, fasting blood sugar, two-hour postprandial glucose, diabetes mellitus, history of heart disease, history of high blood pressure, and history of diabetes are the most contributing factors for predicting CVD incidence in the future. The main differences between the results of classification algorithms are due to the trade-off between sensitivity and specificity. Quadratic Discriminant Analysis (QDA) algorithm presents the highest accuracy (75.50 ± 0.08) but the minimum sensitivity (49.84 ± 0.25); In contrast, decision trees provide the lowest accuracy (51.95 ± 0.69) but the top sensitivity (82.52 ± 1.22). BARTm.90% resulted in 69.48 ± 0.28 accuracy and 54.00 ± 1.66 sensitivity without any preprocessing step.ConclusionsThis study confirmed that building a prediction model for CVD in each region is valuable for screening and primary prevention strategies in that specific region. Also, results showed that using conventional statistical models alongside ML algorithms makes it possible to take advantage of both techniques. Generally, QDA can accurately predict the future occurrence of CVD events with a fast (inference speed) and stable (confidence values) procedure. The combined ML and statistical algorithm of BARTm provide a flexible approach without any need for technical knowledge about assumptions and preprocessing steps of the prediction procedure.

【 授权许可】

CC BY   
© The Author(s) 2023

【 预 览 】
附件列表
Files Size Format View
RO202304227249399ZK.pdf 2755KB PDF download
40507_2023_167_Article_IEq460.gif 1KB Image download
40507_2023_167_Article_IEq464.gif 1KB Image download
MediaObjects/42004_2022_764_MOESM3_ESM.pdf 379KB PDF download
MediaObjects/42004_2022_764_MOESM4_ESM.pdf 4462KB PDF download
【 图 表 】

40507_2023_167_Article_IEq464.gif

40507_2023_167_Article_IEq460.gif

【 参考文献 】
  • [1]
  • [2]
  • [3]
  • [4]
  • [5]
  • [6]
  • [7]
  • [8]
  • [9]
  • [10]
  • [11]
  • [12]
  • [13]
  • [14]
  • [15]
  • [16]
  • [17]
  • [18]
  • [19]
  • [20]
  • [21]
  • [22]
  • [23]
  • [24]
  • [25]
  • [26]
  • [27]
  • [28]
  • [29]
  • [30]
  • [31]
  • [32]
  • [33]
  • [34]
  • [35]
  • [36]
  • [37]
  • [38]
  • [39]
  • [40]
  • [41]
  • [42]
  • [43]
  • [44]
  • [45]
  • [46]
  • [47]
  • [48]
  • [49]
  • [50]
  • [51]
  • [52]
  • [53]
  • [54]
  • [55]
  • [56]
  • [57]
  • [58]
  • [59]
  • [60]
  • [61]
  • [62]
  • [63]
  • [64]
  文献评价指标  
  下载次数:2次 浏览次数:3次