科技报告详细信息
Impact of imperfect OCR on part-of-speech tagging
Lin, Xiaofan
HP Development Company
关键词: part-of-speech tagging;    optical character recognition;    natural language processing;    system combination;    majority voting;    sensitivity analysis;   
RP-ID  :  HPL-2002-7R1
学科分类:计算机科学(综合)
美国|英语
来源: HP Labs
PDF
【 摘 要 】

Part-of-speech (POS) tagging is the foundation of natural language processing (NLP) systems, and thus has been an active area of research for many years. However, one question remains unanswered: How will a POS tagger behave when the input text is not error- free? This issue can be of great importance when the text comes from imperfect sources like Optical Character Recognition (OCR). This paper analyzes the performance of both individual POS taggers and combination systems on imperfect text. Experimental results show that a POS tagger's accuracy will decrease linearly with the character error rate and the slope indicates a tagger's sensitivity to input text errors. 6 Pages

【 预 览 】
附件列表
Files Size Format View
RO201804100002120LZ 46KB PDF download
  文献评价指标  
  下载次数:30次 浏览次数:70次