期刊论文详细信息
NEUROCOMPUTING 卷:150
Structure detection and segmentation of documents using 2D stochastic context-free grammars
Article; Proceedings Paper
Alvaro, Francisco1  Cruz, Francisco2  Sanchez, Joan-Andreu1  Terrades, Oriol Ramos2  Benedi, Jose-Miguel1 
[1] Univ Politecn Valencia, Valencia, Spain
[2] Univ Autonoma Barcelona, Ctr Visio Computador, E-08193 Barcelona, Spain
关键词: Document image analysis;    Stochastic context-free grammars;    Text classification features;   
DOI  :  10.1016/j.neucom.2014.08.076
来源: Elsevier
PDF
【 摘 要 】

In this paper we define a bidimensional extension of stochastic context-free grammars for structure detection and segmentation of images of documents. Two sets of text classification features are used to perform an initial classification of each zone of the page. Then, the document segmentation is obtained as the most likely hypothesis according to a stochastic grammar. We used a dataset of historical marriage license books to validate this approach. We also tested several inference algorithms for probabilistic graphical models and the results showed that the proposed grammatical model outperformed the other methods. Furthermore, grammars also provide the document structure along with its segmentation. (C) 2014 Elsevier B.V. All rights reserved.

【 授权许可】

Free   

【 预 览 】
附件列表
Files Size Format View
10_1016_j_neucom_2014_08_076.pdf 4620KB PDF download
  文献评价指标  
  下载次数:5次 浏览次数:0次