科技报告详细信息
Dictionary and pattern-based recognition of organization names in Russian news texts
Solovyev, Valery ; Gareev, Rinat ; Ivanov, Vladimir ; Serebryakov, Sergey ; Vassilieva, Natalia
HP Development Company
关键词: Named entity recognition;    knowledge-based event extraction;   
RP-ID  :  HPL-2013-14
学科分类:计算机科学(综合)
美国|英语
来源: HP Labs
PDF
【 摘 要 】
This paper describes a part of the event extraction system which has been developed in collaboration with HP Labs Russia. The domain of input texts is business news feeds. One of the most important event participant types is 'Organization'. This paper is focused on the problem of organization names recognition in Russian news texts. Two approaches have been implemented. The first is dictionary-based. We propose an algorithm to make a dictionary from a set of legal body full names gathered from a government registry. The main problems with the dictionary matching are incorrect stemming and significant fraction of ambiguous names among dictionary entries. The second recognition approach is based on usage of local context clues and internal name words. These words constitute patterns which are intrinsic to organization names. These patterns enable recognition of non-dictionary names. We propose an algorithm to derive such patterns from the original dictionary.
【 预 览 】
附件列表
Files Size Format View
RO201804100000536LZ 928KB PDF download
  文献评价指标  
  下载次数:15次 浏览次数:28次