科技报告详细信息
Quality Assurance for Document Understanding Systems
Yacoub, Sherif
HP Development Company
关键词: quality assurance;    document understanding;    content remastering;   
RP-ID  :  HPL-2002-116
学科分类:计算机科学(综合)
美国|英语
来源: HP Labs
PDF
【 摘 要 】
Document understanding is a field that is concerned with semantic analysis of documents to extract human understandable information and codify it into machine- readable form. Document understanding systems provide means to automatically extract meaningful information from a raster image of a document. Those systems provide means to create information rich content that is usable in many end-user applications such as search and retrieval. To process a large volume of data, such as the collection of books and journals produced by a publisher, content understanding systems should run non-stop in an automated fashion and in an unattended operation mode. Ensuring the quality of the output of such system is a challenging task due to several factors including the unattended nature of the system and the mass amount of data (in terabytes) which could give rise to considerable number of exceptions. Automated quality assurance (QA) techniques are essential to the success of the operation of a large- scale document understanding system. In this paper, we propose QA techniques that are essentially needed for a document understanding system and their automation. 19 Pages
【 预 览 】
附件列表
Files Size Format View
RO201804100001880LZ 150KB PDF download
  文献评价指标  
  下载次数:10次 浏览次数:50次