期刊论文详细信息
BMC Medical Informatics and Decision Making
Text data extraction for a prospective, research-focused data mart: implementation and validation
Correspondence
Sofia Podlusky1  John Varga1  Rowland W Chang2  Monique Hinchcliff3  Eric Just4  Warren A Kibbe5 
[1] Department of Medicine, Division of Rheumatology, Northwestern University Feinberg School of Medicine, Chicago, USA;Department of Medicine, Division of Rheumatology, Northwestern University Feinberg School of Medicine, Chicago, USA;Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, USA;Physical Medicine and Rehabilitation, Northwestern University Feinberg School of Medicine, Chicago, USA;Department of Medicine, Division of Rheumatology, Northwestern University Feinberg School of Medicine, Chicago, USA;Northwestern University Feinberg School of Medicine, McGaw Pavilion, Suite M300, 240 E Huron Street, 60611, Chicago, IL, USA;Northwestern Medical Enterprise Data Warehouse, Chicago, USA;Robert H. Lurie Comprehensive Cancer Center, Northwestern University Biomedical Informatics Center, Chicago, USA;
关键词: Medical informatics;    Information storage and retrieval;    Information systems;    Electronic health records;    Automatic data processing;   
DOI  :  10.1186/1472-6947-12-106
 received in 2012-03-12, accepted in 2012-09-03,  发布年份 2012
来源: Springer
PDF
【 摘 要 】

BackgroundTranslational research typically requires data abstracted from medical records as well as data collected specifically for research. Unfortunately, many data within electronic health records are represented as text that is not amenable to aggregation for analyses. We present a scalable open source SQL Server Integration Services package, called Regextractor, for including regular expression parsers into a classic extract, transform, and load workflow. We have used Regextractor to abstract discrete data from textual reports from a number of ‘machine generated’ sources. To validate this package, we created a pulmonary function test data mart and analyzed the quality of the data mart versus manual chart review.MethodsEleven variables from pulmonary function tests performed closest to the initial clinical evaluation date were studied for 100 randomly selected subjects with scleroderma. One research assistant manually reviewed, abstracted, and entered relevant data into a database. Correlation with data obtained from the automated pulmonary function test data mart within the Northwestern Medical Enterprise Data Warehouse was determined.ResultsThere was a near perfect (99.5%) agreement between results generated from the Regextractor package and those obtained via manual chart abstraction. The pulmonary function test data mart has been used subsequently to monitor disease progression of patients in the Northwestern Scleroderma Registry. In addition to the pulmonary function test example presented in this manuscript, the Regextractor package has been used to create cardiac catheterization and echocardiography data marts. The Regextractor package was released as open source software in October 2009 and has been downloaded 552 times as of 6/1/2012.ConclusionsCollaboration between clinical researchers and biomedical informatics experts enabled the development and validation of a tool (Regextractor) to parse, abstract and assemble structured data from text data contained in the electronic health record. Regextractor has been successfully used to create additional data marts in other medical domains and is available to the public.

【 授权许可】

CC BY   
© Hinchcliff et al.; licensee BioMed Central Ltd. 2012

【 预 览 】
附件列表
Files Size Format View
RO202311096355600ZK.pdf 491KB PDF download
【 参考文献 】
  • [1]
  • [2]
  • [3]
  • [4]
  • [5]
  • [6]
  • [7]
  • [8]
  • [9]
  • [10]
  • [11]
  • [12]
  • [13]
  • [14]
  • [15]
  • [16]
  • [17]
  • [18]
  文献评价指标  
  下载次数:1次 浏览次数:2次