学位论文详细信息
emrQA: A large corpus for question answering on electronic medical records
Electronic Medical Records, Question Answering, Logical Forms, Semantic Parsing, Dataset Generation, Closed Domain, i2b2
Pampari, Anusri ; Peng ; Jian
关键词: Electronic Medical Records, Question Answering, Logical Forms, Semantic Parsing, Dataset Generation, Closed Domain, i2b2;   
Others  :  https://www.ideals.illinois.edu/bitstream/handle/2142/102500/PAMPARI-THESIS-2018.pdf?sequence=1&isAllowed=y
美国|英语
来源: The Illinois Digital Environment for Access to Learning and Scholarship
PDF
【 摘 要 】

We propose a novel methodology to generate domain-specific large-scale question answering (QA) datasets by re-purposing existing annotations for other NLP tasks. We demonstrate an instance of this methodology in generating a large-scale QA dataset for electronic medical records by leveraging existing expert annotations on clinical notes for various NLP tasks from the community shared i2b2 datasets. The resulting corpus (emrQA) has 1 million question-logical form and 400,000+ question-answer evidence pairs. We characterize the dataset and explore its learning potential by training baseline models for question to logical form and question to answer mapping.

【 预 览 】
附件列表
Files Size Format View
emrQA: A large corpus for question answering on electronic medical records 631KB PDF download
  文献评价指标  
  下载次数:11次 浏览次数:18次