科技报告详细信息
Biblio: Automatic meta-data extraction
Staelin, Carl ; Elad, Michael ; Greig, Darryl ; Shmueli, Oded ; Vans, Marie
HP Development Company
关键词: document understanding;    learning;    support vector machines;    neural networks;   
RP-ID  :  HPL-2004-190
学科分类:计算机科学(综合)
美国|英语
来源: HP Labs
PDF
【 摘 要 】
Biblio is an adaptive system that automatically extracts meta-data from semi. structured and structured scanned documents. Instead of using hand- coded templates or other methods manually customized for each given document format, it uses example-based machine learning to adapt to customer-defined document and meta-data types. We provide results from two document corpuses, a set of scanned journal articles and a set of scanned legal documents. The first set is semi-structured, as the different journals use a variety of flexible layouts. The second set is largely free-form text based on poor quality scans of FAX- quality legal documents. We demonstrate accuracy on the semi-structured document set roughly comparable to hand-coded systems, and much worse performance on the legal documents. 26 Pages
【 预 览 】
附件列表
Files Size Format View
RO201804100000867LZ 342KB PDF download
  文献评价指标  
  下载次数:19次 浏览次数:76次