科技报告详细信息
Header and Footer Extraction by Page-Association
Lin, Xiaofan
HP Development Company
关键词: document structure analysis;    optical character recognition;    header/footer extraction;    digit content re-mastering;   
RP-ID  :  HPL-2002-129
学科分类:计算机科学(综合)
美国|英语
来源: HP Labs
PDF
【 摘 要 】

This paper introduces a robust algorithm to extract headers and footers from a variety of electronic documents such as image files, Adobe PDF files and files generated by Optical Character Recognition (OCR). Compared with the conventional methods based on page-level layout and format, the proposed novel strategy considers a page in the context of neighboring pages. Through such page-association, the headers and footers on a variety of documents can be automatically detected without human interference. In addition, the application of fuzzy string match also make the method resistant against OCR errors.

【 预 览 】
附件列表
Files Size Format View
RO201804100001846LZ 414KB PDF download
  文献评价指标  
  下载次数:8次 浏览次数:37次