期刊论文详细信息
BMC Bioinformatics
Sortal anaphora resolution to enhance relation extraction from biomedical literature
Research Article
Marcelo Fiszman1  Thomas C. Rindflesch1  Halil Kilicoglu1  Graciela Rosemblat1 
[1] Lister Hill National Center for Biomedical Communications, U.S. National Library of Medicine, 8600 Rockville Pike, 20894, Bethesda, MD, USA;
关键词: Natural language processing;    Sortal anaphora resolution;    Biomedical literature;    Semantic relation extraction;   
DOI  :  10.1186/s12859-016-1009-6
 received in 2015-11-04, accepted in 2016-04-01,  发布年份 2016
来源: Springer
PDF
【 摘 要 】

BackgroundEntity coreference is common in biomedical literature and it can affect text understanding systems that rely on accurate identification of named entities, such as relation extraction and automatic summarization. Coreference resolution is a foundational yet challenging natural language processing task which, if performed successfully, is likely to enhance such systems significantly. In this paper, we propose a semantically oriented, rule-based method to resolve sortal anaphora, a specific type of coreference that forms the majority of coreference instances in biomedical literature. The method addresses all entity types and relies on linguistic components of SemRep, a broad-coverage biomedical relation extraction system. It has been incorporated into SemRep, extending its core semantic interpretation capability from sentence level to discourse level.ResultsWe evaluated our sortal anaphora resolution method in several ways. The first evaluation specifically focused on sortal anaphora relations. Our methodology achieved a F1 score of 59.6 on the test portion of a manually annotated corpus of 320 Medline abstracts, a 4-fold improvement over the baseline method. Investigating the impact of sortal anaphora resolution on relation extraction, we found that the overall effect was positive, with 50 % of the changes involving uninformative relations being replaced by more specific and informative ones, while 35 % of the changes had no effect, and only 15 % were negative. We estimate that anaphora resolution results in changes in about 1.5 % of approximately 82 million semantic relations extracted from the entire PubMed.ConclusionsOur results demonstrate that a heavily semantic approach to sortal anaphora resolution is largely effective for biomedical literature. Our evaluation and error analysis highlight some areas for further improvements, such as coordination processing and intra-sentential antecedent selection.

【 授权许可】

CC BY   
© Kilicoglu et al. 2016

【 预 览 】
附件列表
Files Size Format View
RO202311097984350ZK.pdf 1062KB PDF download
【 参考文献 】
  • [1]
  • [2]
  • [3]
  • [4]
  • [5]
  • [6]
  • [7]
  • [8]
  • [9]
  • [10]
  • [11]
  • [12]
  • [13]
  • [14]
  • [15]
  • [16]
  • [17]
  • [18]
  • [19]
  • [20]
  • [21]
  • [22]
  • [23]
  • [24]
  • [25]
  • [26]
  • [27]
  • [28]
  • [29]
  • [30]
  • [31]
  • [32]
  • [33]
  • [34]
  • [35]
  • [36]
  • [37]
  • [38]
  • [39]
  • [40]
  • [41]
  • [42]
  • [43]
  • [44]
  • [45]
  • [46]
  • [47]
  • [48]
  • [49]
  • [50]
  • [51]
  • [52]
  • [53]
  • [54]
  • [55]
  • [56]
  • [57]
  文献评价指标  
  下载次数:8次 浏览次数:0次