期刊论文详细信息
Frontiers in Pharmacology
Approach to machine learning for extraction of real-world data variables from electronic health records
Pharmacology
Erin Fidyk1  Tarun Bansal1  Konstantin Krismer1  Corey M. Benedum1  Evan Estola1  Jonathan Kelly1  Will Shapiro1  Sheila Nemeth1  James Gippetti1  Robin Linzmayer1  John Ritten1  Melissa Estévez1  Katherine Harrison1  Michael Waskom1  George Ho1  Samuel Wilkinson1  Guy Amster1  Auriane Blarre1  Aaron B. Cohen2  Blythe Adamson3 
[1] Flatiron Health, Inc., New York, NY, United States;Flatiron Health, Inc., New York, NY, United States;Department of Medicine, NYU Grossman School of Medicine, New York, NY, United States;Flatiron Health, Inc., New York, NY, United States;The Comparative Health Outcomes, Policy, and Economics (CHOICE) Institute, Department of Pharmacy, University of Washington, Seattle, WA, United States;
关键词: electronic health records;    cancer;    oncology;    real-world data;    machine learning;    natural language processing;    artificial intelligence;   
DOI  :  10.3389/fphar.2023.1180962
 received in 2023-03-06, accepted in 2023-08-25,  发布年份 2023
来源: Frontiers
PDF
【 摘 要 】

Background: As artificial intelligence (AI) continues to advance with breakthroughs in natural language processing (NLP) and machine learning (ML), such as the development of models like OpenAI’s ChatGPT, new opportunities are emerging for efficient curation of electronic health records (EHR) into real-world data (RWD) for evidence generation in oncology. Our objective is to describe the research and development of industry methods to promote transparency and explainability.Methods: We applied NLP with ML techniques to train, validate, and test the extraction of information from unstructured documents (e.g., clinician notes, radiology reports, lab reports, etc.) to output a set of structured variables required for RWD analysis. This research used a nationwide electronic health record (EHR)-derived database. Models were selected based on performance. Variables curated with an approach using ML extraction are those where the value is determined solely based on an ML model (i.e. not confirmed by abstraction), which identifies key information from visit notes and documents. These models do not predict future events or infer missing information.Results: We developed an approach using NLP and ML for extraction of clinically meaningful information from unstructured EHR documents and found high performance of output variables compared with variables curated by manually abstracted data. These extraction methods resulted in research-ready variables including initial cancer diagnosis with date, advanced/metastatic diagnosis with date, disease stage, histology, smoking status, surgery status with date, biomarker test results with dates, and oral treatments with dates.Conclusion: NLP and ML enable the extraction of retrospective clinical data in EHR with speed and scalability to help researchers learn from the experience of every person with cancer.

【 授权许可】

Unknown   
Copyright © 2023 Adamson, Waskom, Blarre, Kelly, Krismer, Nemeth, Gippetti, Ritten, Harrison, Ho, Linzmayer, Bansal, Wilkinson, Amster, Estola, Benedum, Fidyk, Estévez, Shapiro and Cohen.

【 预 览 】
附件列表
Files Size Format View
RO202310125496636ZK.pdf 2601KB PDF download
  文献评价指标  
  下载次数:10次 浏览次数:0次