科技报告

科技报告详细信息


Biblio: Automatic meta-data extraction

Staelin, Carl ; Elad, Michael ; Greig, Darryl ; Shmueli, Oded ; Vans, Marie
HP Development Company
关键词: document understanding; learning; support vector machines; neural networks;
RP-ID : HPL-2004-190
学科分类：计算机科学（综合）
美国\|英语
来源: HP Labs
PDF

【摘要】

Biblio is an adaptive system that automatically extracts meta-data from semi. structured and structured scanned documents. Instead of using hand- coded templates or other methods manually customized for each given document format, it uses example-based machine learning to adapt to customer-defined document and meta-data types. We provide results from two document corpuses, a set of scanned journal articles and a set of scanned legal documents. The first set is semi-structured, as the different journals use a variety of flexible layouts. The second set is largely free-form text based on poor quality scans of FAX- quality legal documents. We demonstrate accuracy on the semi-structured document set roughly comparable to hand-coded systems, and much worse performance on the legal documents. 26 Pages

【预览】

附件列表
Files	Size	Format	View
RO201804100000867LZ	342KB	PDF	download


	文献评价指标
	下载次数：28次	浏览次数：77次

京公网安备340104078870146号 878987797 028-85220240

OAinOne平台基于对开放资源的发现、遴选和评价方式，发现、获取、集成9类优质的开放科技资源，包括开放期刊、开放会议论文、开放课件、科技政策、开放学位论文、开放图书、开放科技报告、科研项目、开放科学数据。同时，为实现开放知识资源普遍服务、个性化服务、精准服务，基于OAinONE集成的丰富开放资源，开发建设领域开放知识资源服务定制工具(OAtoYOU)、开放资源评价评估体系（OAEvaluation），建设集成OAinONE资源及其他第三方资源的OA Hub，及其面向我院分布式大数据知识资源系统及其他第三方的开放接口服务，并打造特色专题数据库产品建设，包括科技政策集成及趋势平台、开放课程大讲堂等。此外，OAinOne构建开放知识资源建设的可持续发展机制，支持我院研究所特色馆藏资源、自建资源、古籍资源等在OAinONE平台上的集成、开放、共享。

【 摘 要 】

【 预 览 】

【摘要】

【预览】