BMC Medical Research Methodology | |
Unsupervised anomaly detection of implausible electronic health records: a real-world evaluation in cancer registries | |
Research Article | |
Philipp Röchner1  Franz Rothlauf1  | |
[1]Information Systems and Business Administration, Johannes Gutenberg University, Jakob-Welder-Weg 9, 55128, Mainz, Germany | |
关键词: Anomaly detection; Outlier detection; Data quality; Quality control; Electronic health records; Medical records; Cancer registration; Neural network; Machine learning; Artificial intelligence; | |
DOI : 10.1186/s12874-023-01946-0 | |
received in 2022-07-11, accepted in 2023-05-09, 发布年份 2023 | |
来源: Springer | |
【 摘 要 】
BackgroundCancer registries collect patient-specific information about cancer diseases. The collected information is verified and made available to clinical researchers, physicians, and patients. When processing information, cancer registries verify that the patient-specific records they collect are plausible. This means that the collected information about a particular patient makes medical sense.MethodsUnsupervised machine learning approaches can detect implausible electronic health records without human guidance. Therefore, this article investigates two unsupervised anomaly detection approaches, a pattern-based approach (FindFPOF) and a compression-based approach (autoencoder), to identify implausible electronic health records in cancer registries. Unlike most existing work that analyzes synthetic anomalies, we compare the performance of both approaches and a baseline (random selection of records) on a real-world dataset. The dataset contains 21,104 electronic health records of patients with breast, colorectal, and prostate tumors. Each record consists of 16 categorical variables describing the disease, the patient, and the diagnostic procedure. The samples identified by FindFPOF, the autoencoder, and a random selection—a total of 785 different records—are evaluated in a real-world scenario by medical domain experts.ResultsBoth anomaly detection methods are good at detecting implausible electronic health records. First, domain experts identified 8%\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$8\%$$\end{document} of 300 randomly selected records as implausible. With FindFPOF and the autoencoder, 28%\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$28\%$$\end{document} of the proposed 300 records in each sample were implausible. This corresponds to a precision of 28%\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$28\%$$\end{document} for FindFPOF and the autoencoder. Second, for 300 randomly selected records that were labeled by domain experts, the sensitivity of the autoencoder was 22%\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$22\%$$\end{document} and the sensitivity of FindFPOF was 26%\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$26\%$$\end{document}. Both anomaly detection methods had a specificity of 94%\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$94\%$$\end{document}. Third, FindFPOF and the autoencoder suggested samples with a different distribution of values than the overall dataset. For example, both anomaly detection methods suggested a higher proportion of colorectal records, the tumor localization with the highest percentage of implausible records in a randomly selected sample.ConclusionsUnsupervised anomaly detection can significantly reduce the manual effort of domain experts to find implausible electronic health records in cancer registries. In our experiments, the manual effort was reduced by a factor of approximately 3.5 compared to evaluating a random sample.【 授权许可】
CC BY
© The Author(s) 2023
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
RO202308153193651ZK.pdf | 1829KB | download | |
41116_2023_36_Article_IEq197.gif | 1KB | Image | download |
Fig. 1 | 287KB | Image | download |
Fig. 1 | 84KB | Image | download |
41116_2023_36_Article_IEq219.gif | 1KB | Image | download |
41116_2023_36_Article_IEq234.gif | 1KB | Image | download |
41116_2023_36_Article_IEq237.gif | 1KB | Image | download |
41116_2023_36_Article_IEq238.gif | 1KB | Image | download |
41116_2023_36_Article_IEq239.gif | 1KB | Image | download |
41116_2023_36_Article_IEq240.gif | 1KB | Image | download |
41116_2023_36_Article_IEq241.gif | 1KB | Image | download |
41116_2023_36_Article_IEq242.gif | 1KB | Image | download |
41116_2023_36_Article_IEq243.gif | 1KB | Image | download |
Fig. 3 | 993KB | Image | download |
41116_2023_36_Article_IEq245.gif | 1KB | Image | download |
41116_2023_36_Article_IEq246.gif | 1KB | Image | download |
Fig. 1 | 240KB | Image | download |
41116_2023_36_Article_IEq260.gif | 1KB | Image | download |
MediaObjects/12888_2023_4849_MOESM1_ESM.docx | 17KB | Other | download |
Fig. 1 | 137KB | Image | download |
41116_2023_36_Article_IEq269.gif | 1KB | Image | download |
41116_2023_36_Article_IEq280.gif | 1KB | Image | download |
Fig. 1 | 406KB | Image | download |
Fig. 1 | 1158KB | Image | download |
41116_2023_36_Article_IEq291.gif | 1KB | Image | download |
41116_2023_36_Article_IEq292.gif | 1KB | Image | download |
41116_2023_36_Article_IEq293.gif | 1KB | Image | download |
40517_2023_256_Article_IEq213.gif | 1KB | Image | download |
41116_2023_36_Article_IEq295.gif | 1KB | Image | download |
41116_2023_36_Article_IEq297.gif | 1KB | Image | download |
MediaObjects/12888_2023_4827_MOESM1_ESM.docx | 46KB | Other | download |
MediaObjects/12888_2023_4827_MOESM2_ESM.docx | 14KB | Other | download |
MediaObjects/12888_2023_4830_MOESM1_ESM.docx | 36KB | Other | download |
41116_2023_36_Article_IEq314.gif | 1KB | Image | download |
41116_2023_36_Article_IEq315.gif | 1KB | Image | download |
41116_2023_36_Article_IEq316.gif | 1KB | Image | download |
41116_2023_36_Article_IEq317.gif | 1KB | Image | download |
41116_2023_36_Article_IEq318.gif | 1KB | Image | download |
Fig. 2 | 83KB | Image | download |
Fig. 3 | 1194KB | Image | download |
41116_2023_36_Article_IEq321.gif | 1KB | Image | download |
41116_2023_36_Article_IEq322.gif | 1KB | Image | download |
41116_2023_36_Article_IEq323.gif | 1KB | Image | download |
41116_2023_36_Article_IEq324.gif | 1KB | Image | download |
41116_2023_36_Article_IEq325.gif | 1KB | Image | download |
40517_2023_256_Article_IEq244.gif | 1KB | Image | download |
41116_2023_36_Article_IEq326.gif | 1KB | Image | download |
41116_2023_36_Article_IEq327.gif | 1KB | Image | download |
41116_2023_36_Article_IEq328.gif | 1KB | Image | download |
Fig. 2 | 110KB | Image | download |
41116_2023_36_Article_IEq330.gif | 1KB | Image | download |
Fig. 4 | 1202KB | Image | download |
41116_2023_36_Article_IEq332.gif | 1KB | Image | download |
41116_2023_36_Article_IEq333.gif | 1KB | Image | download |
41116_2023_36_Article_IEq334.gif | 1KB | Image | download |
41116_2023_36_Article_IEq336.gif | 1KB | Image | download |
41116_2023_36_Article_IEq337.gif | 1KB | Image | download |
40517_2023_256_Article_IEq256.gif | 1KB | Image | download |
41116_2023_36_Article_IEq338.gif | 1KB | Image | download |
Fig. 1 | 77KB | Image | download |
41116_2023_36_Article_IEq340.gif | 1KB | Image | download |
41116_2023_36_Article_IEq341.gif | 1KB | Image | download |
41116_2023_36_Article_IEq342.gif | 1KB | Image | download |
Fig. 5 | 561KB | Image | download |
41116_2023_36_Article_IEq344.gif | 1KB | Image | download |
41116_2023_36_Article_IEq345.gif | 1KB | Image | download |
41116_2023_36_Article_IEq346.gif | 1KB | Image | download |
41116_2023_36_Article_IEq347.gif | 1KB | Image | download |
MediaObjects/12888_2023_4883_MOESM1_ESM.docx | 17KB | Other | download |
Fig. 3 | 70KB | Image | download |
41116_2023_36_Article_IEq350.gif | 1KB | Image | download |
41116_2023_36_Article_IEq352.gif | 1KB | Image | download |
41116_2023_36_Article_IEq353.gif | 1KB | Image | download |
41116_2023_36_Article_IEq354.gif | 1KB | Image | download |
Fig. 1 | 30KB | Image | download |
Fig. 2 | 29KB | Image | download |
Fig. 3 | 30KB | Image | download |
41116_2023_36_Article_IEq358.gif | 1KB | Image | download |
MediaObjects/12888_2023_4767_MOESM1_ESM.pdf | 1353KB | download | |
41116_2023_36_Article_IEq359.gif | 1KB | Image | download |
41116_2023_36_Article_IEq360.gif | 1KB | Image | download |
41116_2023_36_Article_IEq361.gif | 1KB | Image | download |
Fig. 4 | 104KB | Image | download |
41116_2023_36_Article_IEq362.gif | 1KB | Image | download |
MediaObjects/12888_2023_4867_MOESM1_ESM.docx | 16KB | Other | download |
41116_2023_36_Article_IEq364.gif | 1KB | Image | download |
41116_2023_36_Article_IEq365.gif | 1KB | Image | download |
41116_2023_36_Article_IEq366.gif | 1KB | Image | download |
41116_2023_36_Article_IEq367.gif | 1KB | Image | download |
41116_2023_36_Article_IEq368.gif | 1KB | Image | download |
41116_2023_36_Article_IEq369.gif | 1KB | Image | download |
41116_2023_36_Article_IEq370.gif | 1KB | Image | download |
MediaObjects/12888_2023_4867_MOESM2_ESM.docx | 17KB | Other | download |
41116_2023_36_Article_IEq372.gif | 1KB | Image | download |
MediaObjects/12888_2023_4867_MOESM3_ESM.docx | 17KB | Other | download |
Fig. 5 | 863KB | Image | download |
41116_2023_36_Article_IEq374.gif | 1KB | Image | download |
41116_2023_36_Article_IEq376.gif | 1KB | Image | download |
【 图 表 】
41116_2023_36_Article_IEq376.gif
41116_2023_36_Article_IEq374.gif
Fig. 5
41116_2023_36_Article_IEq372.gif
41116_2023_36_Article_IEq370.gif
41116_2023_36_Article_IEq369.gif
41116_2023_36_Article_IEq368.gif
41116_2023_36_Article_IEq367.gif
41116_2023_36_Article_IEq366.gif
41116_2023_36_Article_IEq365.gif
41116_2023_36_Article_IEq364.gif
41116_2023_36_Article_IEq362.gif
Fig. 4
41116_2023_36_Article_IEq361.gif
41116_2023_36_Article_IEq360.gif
41116_2023_36_Article_IEq359.gif
41116_2023_36_Article_IEq358.gif
Fig. 3
Fig. 2
Fig. 1
41116_2023_36_Article_IEq354.gif
41116_2023_36_Article_IEq353.gif
41116_2023_36_Article_IEq352.gif
41116_2023_36_Article_IEq350.gif
Fig. 3
41116_2023_36_Article_IEq347.gif
41116_2023_36_Article_IEq346.gif
41116_2023_36_Article_IEq345.gif
41116_2023_36_Article_IEq344.gif
Fig. 5
41116_2023_36_Article_IEq342.gif
41116_2023_36_Article_IEq341.gif
41116_2023_36_Article_IEq340.gif
Fig. 1
41116_2023_36_Article_IEq338.gif
40517_2023_256_Article_IEq256.gif
41116_2023_36_Article_IEq337.gif
41116_2023_36_Article_IEq336.gif
41116_2023_36_Article_IEq334.gif
41116_2023_36_Article_IEq333.gif
41116_2023_36_Article_IEq332.gif
Fig. 4
41116_2023_36_Article_IEq330.gif
Fig. 2
41116_2023_36_Article_IEq328.gif
41116_2023_36_Article_IEq327.gif
41116_2023_36_Article_IEq326.gif
40517_2023_256_Article_IEq244.gif
41116_2023_36_Article_IEq325.gif
41116_2023_36_Article_IEq324.gif
41116_2023_36_Article_IEq323.gif
41116_2023_36_Article_IEq322.gif
41116_2023_36_Article_IEq321.gif
Fig. 3
Fig. 2
41116_2023_36_Article_IEq318.gif
41116_2023_36_Article_IEq317.gif
41116_2023_36_Article_IEq316.gif
41116_2023_36_Article_IEq315.gif
41116_2023_36_Article_IEq314.gif
41116_2023_36_Article_IEq297.gif
41116_2023_36_Article_IEq295.gif
40517_2023_256_Article_IEq213.gif
41116_2023_36_Article_IEq293.gif
41116_2023_36_Article_IEq292.gif
41116_2023_36_Article_IEq291.gif
Fig. 1
Fig. 1
41116_2023_36_Article_IEq280.gif
41116_2023_36_Article_IEq269.gif
Fig. 1
41116_2023_36_Article_IEq260.gif
Fig. 1
41116_2023_36_Article_IEq246.gif
41116_2023_36_Article_IEq245.gif
Fig. 3
41116_2023_36_Article_IEq243.gif
41116_2023_36_Article_IEq242.gif
41116_2023_36_Article_IEq241.gif
41116_2023_36_Article_IEq240.gif
41116_2023_36_Article_IEq239.gif
41116_2023_36_Article_IEq238.gif
41116_2023_36_Article_IEq237.gif
41116_2023_36_Article_IEq234.gif
41116_2023_36_Article_IEq219.gif
Fig. 1
Fig. 1
41116_2023_36_Article_IEq197.gif
【 参考文献 】
- [1]
- [2]
- [3]
- [4]
- [5]
- [6]
- [7]
- [8]
- [9]
- [10]
- [11]
- [12]
- [13]
- [14]
- [15]
- [16]
- [17]
- [18]
- [19]
- [20]
- [21]
- [22]
- [23]
- [24]
- [25]
- [26]
- [27]
- [28]
- [29]
- [30]