学位论文详细信息
Automatically identifying facet roles from comparative structures to support biomedical text summarization
Comparison sentences;Natural language processing;Text mining;Text summarization;Information extraction
Lucic, Ana
关键词: Comparison sentences;    Natural language processing;    Text mining;    Text summarization;    Information extraction;   
Others  :  https://www.ideals.illinois.edu/bitstream/handle/2142/98087/LUCIC-DISSERTATION-2017.pdf?sequence=1&isAllowed=y
美国|英语
来源: The Illinois Digital Environment for Access to Learning and Scholarship
PDF
【 摘 要 】

Within the context of biomedical scholarly articles, comparison sentences represent a rhetorical structure commonly used to communicate findings. More generally, comparison sentences are rich with information about how the properties of one or more entities relate one another. So far, in the biomedical domain, the emphasis has been on recognizing comparative sentences in the text.This dissertation goes beyond sentence-level recognition and aims to automate the identification of the integral parts of a comparison sentence which are called comparative facets and include: compared entities, the basis or the endpoint of comparison as well as the result or the relationship that binds the entities and the basis.Only the sentences that contain each of the four facets are of interest in this thesis. With respect to the first compared entity, the system achieves an average F1 on a random sample of short (between 11 and 21 words long) sentences of 0.65; medium (between 22 and <= 28 words) sentences 0.70; long (between 29 and <=36 words) sentences 0.60 and very long (more than 36 words), 0.60. With respect to the basis of comparison prediction (the endpoint), the average F1 measure ranged from 0.66 on short, 0.57 on medium, 0.56 on long, and 0.50 on very long sentences. The average F1 achieved with respect to the second entity compared ranged from 0.91 on short, 0.85 on medium, 0.81 on long and 0.72 on very long sentences.In the area of semantic relation identification, the performance achieved was also sensitive to sentence length: the average F1 measure on short sentences was 0.80; it was 0.71, 0.56, and 0.51 on medium, long, and very long sentences respectively. Thus, the methods developed in this dissertation work better on sentences that are shorter (<= 28 words) and on those that do not contain multiple claims or disjunctive conjunctions. When applied to a previously unseen collection of breast cancer articles, the performance achieved with respect to the identification of compared entities and the endpoint was comparable to the results achieved on the collection that was used for building and testing the models. This result is promising with respect to the potential of this model being applied on other collections of scholarly articles in the biomedical sciences.

【 预 览 】
附件列表
Files Size Format View
Automatically identifying facet roles from comparative structures to support biomedical text summarization 1580KB PDF download
  文献评价指标  
  下载次数:14次 浏览次数:34次