科技报告详细信息
Web Page Layout Via Visual Segmentation
Pnueli, Ayelet ; Bergman, Ruth ; Schein, Sagi ; Barkol, Omer
HP Development Company
关键词: Layout understanding;    Layout analysis;    Web page segmentation;    HTML;    DOM;   
RP-ID  :  HPL-2009-160
学科分类:计算机科学(综合)
美国|英语
来源: HP Labs
PDF
【 摘 要 】

Web page segmentation is required for any application that observes, manipulates, interacts, summarizes or does anything with web content or web services. Although segmentation is a non-trivial task, until recently it could be performed reasonably by analyzing the HTML structure. Today, the dynamic content of web pages does not fit the assumptions made by those algorithms. The HTML structure does not contain enough information to extract the important regions. Yet, visually, the page itself remains understandable to the human user. Thus, we believe it contains all the information that is needed to understand its content. We propose adding methods of computer vision for the analysis of the page. When the HTML does not contain the needed object hierarchy information, one may use the visual information. Moreover, visual segmentation allows us to correct the HTML structure or to simplify its hierarchy which in many cases is not semantic. We perform top-down segmentation, yielding first the large scale layout of the page, down to the required degree of detail.

【 预 览 】
附件列表
Files Size Format View
RO201804100002658LZ 1435KB PDF download
  文献评价指标  
  下载次数:31次 浏览次数:27次