期刊论文

【摘要】

BackgroundThere is increasing interest in clinical prediction models for rare outcomes such as suicide, psychiatric hospitalizations, and opioid overdose. Accurate model validation is needed to guide model selection and decisions about whether and how prediction models should be used. Split-sample estimation and validation of clinical prediction models, in which data are divided into training and testing sets, may reduce predictive accuracy and precision of validation. Using all data for estimation and validation increases sample size for both procedures, but validation must account for overfitting, or optimism. Our study compared split-sample and entire-sample methods for estimating and validating a suicide prediction model.MethodsWe compared performance of random forest models estimated in a sample of 9,610,318 mental health visits (“entire-sample”) and in a 50% subset (“split-sample”) as evaluated in a prospective validation sample of 3,754,137 visits. We assessed optimism of three internal validation approaches: for the split-sample prediction model, validation in the held-out testing set and, for the entire-sample model, cross-validation and bootstrap optimism correction.ResultsThe split-sample and entire-sample prediction models showed similar prospective performance; the area under the curve, AUC, and 95% confidence interval was 0.81 (0.77–0.85) for both. Performance estimates evaluated in the testing set for the split-sample model (AUC = 0.85 [0.82–0.87]) and via cross-validation for the entire-sample model (AUC = 0.83 [0.81–0.85]) accurately reflected prospective performance. Validation of the entire-sample model with bootstrap optimism correction overestimated prospective performance (AUC = 0.88 [0.86–0.89]). Measures of classification accuracy, including sensitivity and positive predictive value at the 99th, 95th, 90th, and 75th percentiles of the risk score distribution, indicated similar conclusions: bootstrap optimism correction overestimated classification accuracy in the prospective validation set.ConclusionsWhile previous literature demonstrated the validity of bootstrap optimism correction for parametric models in small samples, this approach did not accurately validate performance of a rare-event prediction model estimated with random forests in a large clinical dataset. Cross-validation of prediction models estimated with all available data provides accurate independent validation while maximizing sample size.

【授权许可】

CC BY
© The Author(s) 2023

【预览】

附件列表
Files	Size	Format	View
RO202305152494203ZK.pdf	977KB	PDF	download
Fig. 3	997KB	Image	download
Fig. 1	224KB	Image	download
Fig. 5	1063KB	Image	download
13690_2023_1029_Article_IEq5.gif	1KB	Image	download
13690_2023_1029_Article_IEq7.gif	1KB	Image	download
Fig. 5	3135KB	Image	download
13690_2023_1029_Article_IEq8.gif	1KB	Image	download
MediaObjects/13045_2023_1400_MOESM6_ESM.pdf	599KB	PDF	download
13690_2023_1029_Article_IEq10.gif	1KB	Image	download
13690_2023_1029_Article_IEq11.gif	1KB	Image	download
13690_2023_1029_Article_IEq12.gif	1KB	Image	download
Fig. 2	956KB	Image	download

【图表】

Fig. 2

13690_2023_1029_Article_IEq12.gif

13690_2023_1029_Article_IEq11.gif

13690_2023_1029_Article_IEq10.gif

13690_2023_1029_Article_IEq8.gif

Fig. 5

13690_2023_1029_Article_IEq7.gif

13690_2023_1029_Article_IEq5.gif

Fig. 5

Fig. 1

Fig. 3

【参考文献】

[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
[18]
[19]
[20]
[21]
[22]
[23]
[24]
[25]
[26]
[27]
[28]
[29]
[30]

BMC Medical Research Methodology
Empirical evaluation of internal validation methods for prediction in large-scale clinical data with rare-event outcomes: a case study in suicide risk prediction
Research Article
Qinqing Liao¹ Noah Simon² R Yates Coley² Susan M. Shortreed²
[1] Department of Biostatistics, University of Washington, Seattle, WA, USA;Kaiser Permanente Washington Health Research Institute, 1730 Minor Ave. #1600, 98101, Seattle, WA, USA;Department of Biostatistics, University of Washington, Seattle, WA, USA;
关键词: Bootstrap; Clinical prediction; Cross-validation; Machine learning; Optimism; Random forest; Risk stratification; Split-sample;
DOI : 10.1186/s12874-023-01844-5
received in 2022-04-14, accepted in 2023-01-17, 发布年份 2023
来源: Springer
PDF


	文献评价指标
	下载次数：11次	浏览次数：1次

【 摘 要 】

【 授权许可】

【 预 览 】

【 图 表 】

【 参考文献 】

【摘要】

【授权许可】

【预览】

【图表】

【参考文献】