学位论文详细信息
Concept and entity grounding using indirect supervision
Wikification;Entity linking;Cross-lingual wikification;Named entity recognition;Indirect supervision;Incidental supervision;Entity disambiguation;Concept disambiguation
Tsai, Chen-Tse
关键词: Wikification;    Entity linking;    Cross-lingual wikification;    Named entity recognition;    Indirect supervision;    Incidental supervision;    Entity disambiguation;    Concept disambiguation;   
Others  :  https://www.ideals.illinois.edu/bitstream/handle/2142/98336/TSAI-DISSERTATION-2017.pdf?sequence=1&isAllowed=y
美国|英语
来源: The Illinois Digital Environment for Access to Learning and Scholarship
PDF
【 摘 要 】

Extracting and disambiguating entities and concepts is a crucial step toward understanding natural language text. In this thesis, we consider the problem of grounding concepts and entities mentioned in text to one or more knowledge bases (KBs). A well-studied scenario of this problem is the one in which documents are given in English and the goal is to identify concept and entity mentions, and find the corresponding entries the mentions refer to in Wikipedia. We extend this problem in two directions: First, we study identifying and grounding entities written in anylanguage to the English Wikipedia. Second, we investigate using multiple KBs which do not contain rich textual and structural information Wikipedia does.These more involved settings pose a few additional challenges beyond those addressed in the standard English Wikification problem. Key among them is that no supervision is available to facilitate training machine learning models. The first extension, cross-lingual Wikification, introduces problems such as recognizing multilingual named entities mentioned in text, translating non-English names into English, and computing word similarity across languages. Since it is impossible to acquire manually annotated examples for all languages, building models for all languages in Wikipedia requires exploring indirect or incidental supervision signals which already exist in Wikipedia. For the second setting, we need to deal with the fact that most KBs do not contain the rich information Wikipedia has; consequently, the main supervision signal used to train Wikification rankers does not exist anymore. In this thesis, we show that supervision signals can be obtained by carefully examining the redundancy and relations between multiple KBs. By developing algorithms and models which harvest these incidental signals, we can achieve better performance on these tasks.

【 预 览 】
附件列表
Files Size Format View
Concept and entity grounding using indirect supervision 3277KB PDF download
  文献评价指标  
  下载次数:8次 浏览次数:11次