BMC Cardiovascular Disorders | |
Unlocking echocardiogram measurements for heart disease research through natural language processing | |
Software | |
Samah J. Fodeh1  Melissa Skanderson2  Cynthia A. Brandt3  Scott L. DuVall4  Olga V. Patterson4  Matthew S. Freiberg5  | |
[1] Center for Medical Informatics, School of Medicine, Yale University, West Haven, CT, USA;Connecticut VA Healthcare System, West Haven, CT, USA;Connecticut VA Healthcare System, West Haven, CT, USA;Center for Medical Informatics, School of Medicine, Yale University, West Haven, CT, USA;Department of Veterans Affairs Salt Lake City Health Care System, 500 Foothill Drive Bldg. Mail Code 182, 84148, Salt Lake City, UT, USA;School of Medicine, University of Utah, 295 Chipeta Way, 84132, Salt Lake City, UT, USA;VA Tennessee Valley Health Care System, Nashville, TN, USA;Vanderbilt University Medical Center, Cardiovascular Medicine Division, Nashville, TN, USA; | |
关键词: Natural language processing; Text mining; Information extraction; Echocardiography; Heart function; Left ventricular ejection fraction; | |
DOI : 10.1186/s12872-017-0580-8 | |
received in 2016-11-28, accepted in 2017-05-25, 发布年份 2017 | |
来源: Springer | |
【 摘 要 】
BackgroundIn order to investigate the mechanisms of cardiovascular disease in HIV infected and uninfected patients, an analysis of echocardiogram reports is required for a large longitudinal multi-center study.ImplementationA natural language processing system using a dictionary lookup, rules, and patterns was developed to extract heart function measurements that are typically recorded in echocardiogram reports as measurement-value pairs. Curated semantic bootstrapping was used to create a custom dictionary that extends existing terminologies based on terms that actually appear in the medical record. A novel disambiguation method based on semantic constraints was created to identify and discard erroneous alternative definitions of the measurement terms. The system was built utilizing a scalable framework, making it available for processing large datasets.ResultsThe system was developed for and validated on notes from three sources: general clinic notes, echocardiogram reports, and radiology reports. The system achieved F-scores of 0.872, 0.844, and 0.877 with precision of 0.936, 0.982, and 0.969 for each dataset respectively averaged across all extracted values. Left ventricular ejection fraction (LVEF) is the most frequently extracted measurement. The precision of extraction of the LVEF measure ranged from 0.968 to 1.0 across different document types.ConclusionsThis system illustrates the feasibility and effectiveness of a large-scale information extraction on clinical data. New clinical questions can be addressed in the domain of heart failure using retrospective clinical data analysis because key heart function measurements can be successfully extracted using natural language processing.
【 授权许可】
CC BY
© The Author(s) 2017
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
RO202311108803650ZK.pdf | 652KB | download |
【 参考文献 】
- [1]
- [2]
- [3]
- [4]
- [5]
- [6]
- [7]
- [8]
- [9]
- [10]
- [11]
- [12]
- [13]
- [14]
- [15]
- [16]
- [17]
- [18]
- [19]
- [20]
- [21]
- [22]
- [23]
- [24]
- [25]
- [26]
- [27]
- [28]
- [29]
- [30]
- [31]
- [32]
- [33]
- [34]
- [35]
- [36]