科技报告详细信息
Creating Digital Libraries: Content Generation and Re- Mastering
Simske, Steven J. ; Lin, Xiaofan
HP Development Company
关键词: zoning analysis;    quality assurances;    TIFF;    OCR;    PDF;    meta-algorithmics;   
RP-ID  :  HPL-2003-259
学科分类:计算机科学(综合)
美国|英语
来源: HP Labs
PDF
【 摘 要 】

This paper has two main goals: to describe the automatic creation of a digital library and to provide an overview of the meta-algorithmic patterns that can be applied to increase the accuracy of its creation. Automating the creation of useful digital libraries- that is, digital libraries affording searchable text and reusable ("re-purposable") output-is a complicated process, whether the original library is paper-based or already available in electronic form. In this paper, we outline the steps involved in the creation of a deployable digital library (> 1.2 x 106 pages) for MIT Press, as well as its implications to other aspects of digital library creation, management, use and repurposing. Input, transformation, information extraction, and output processes are considered in light of their utility in creating layers of content. Interestingly, in some aspects, scanning directly from paper offers extra opportunities for error-checking through feedback- feedforward combination. Strategies for quality assurance (QA) at the document, chapter and book level are also discussed. We emphasize the use of meta- algorithmic design patterns for application towards improving the content generation, extraction and re- mastering. This approach also increases the ease with which modules and algorithms are added to and deprecated from the system. Notes: Copyright IEEE. To be published in and presented at the International Workshop on Document Image Analysis for Libraries (DIAL'04), 23-24 January 2004, Palo Alto, California 16 Pages

【 预 览 】
附件列表
Files Size Format View
RO201804100000482LZ 726KB PDF download
  文献评价指标  
  下载次数:12次 浏览次数:47次