期刊论文

【摘要】

IntroductionMethods that automatically flag poor performing predictions are drastically needed to safely implement machine learning workflows into clinical practice as well as to identify difficult cases during model training.MethodsDisagreement between the fivefold cross-validation sub-models was quantified using dice scores between folds and summarized as a surrogate for model confidence. The summarized Interfold Dices were compared with thresholds informed by human interobserver values to determine whether final ensemble model performance should be manually reviewed.ResultsThe method on all tasks efficiently flagged poor segmented images without consulting a reference standard. Using the median Interfold Dice for comparison, substantial dice score improvements after excluding flagged images was noted for the in-domain CT (0.85 ± 0.20 to 0.91 ± 0.08, 8/50 images flagged) and MR (0.76 ± 0.27 to 0.85 ± 0.09, 8/50 images flagged). Most impressively, there were dramatic dice score improvements in the simulated out-of-distribution task where the model was trained on a radical nephrectomy dataset with different contrast phases predicting a partial nephrectomy all cortico-medullary phase dataset (0.67 ± 0.36 to 0.89 ± 0.10, 122/300 images flagged).DiscussionComparing interfold sub-model disagreement against human interobserver values is an effective and efficient way to assess automated predictions when a reference standard is not available. This functionality provides a necessary safeguard to patient care important to safely implement automated medical image segmentation workflows.

【授权许可】

【预览】

附件列表
Files	Size	Format	View
RO202310128955421ZK.pdf	3155KB	PDF	download

Frontiers in Radiology
AI in the Loop: functionalizing fold performance disagreement to monitor automated medical image segmentation workflows
Radiology
Adriana V. Gregory¹ Panagiotis Korfiatis¹ Timothy L. Kline² Harrison C. Gottlich³
[1] Department of Radiology, Mayo Clinic, Rochester, MN, United States;Department of Radiology, Mayo Clinic, Rochester, MN, United States;Division of Nephrology and Hypertension, Mayo Clinic, Rochester, MN, United States;Mayo Clinic Alix School of Medicine, Mayo Clinic, Rochester, MN, United States;
关键词: deep learning; semantic segmentation; machine learning model performance; similarity metrics; epistemic uncertainty; convolutional neural networks; AI in the loop;
DOI : 10.3389/fradi.2023.1223294
received in 2023-05-15, accepted in 2023-08-28, 发布年份 2023
来源: Frontiers
PDF


	文献评价指标
	下载次数：5次	浏览次数：0次

【 摘 要 】

【 授权许可】

【 预 览 】

【摘要】

【授权许可】

【预览】