科技报告

【摘要】

Web page segmentation is required for any application that observes, manipulates, interacts, summarizes or does anything with web content or web services. Although segmentation is a non-trivial task, until recently it could be performed reasonably by analyzing the HTML structure. Today, the dynamic content of web pages does not fit the assumptions made by those algorithms. The HTML structure does not contain enough information to extract the important regions. Yet, visually, the page itself remains understandable to the human user. Thus, we believe it contains all the information that is needed to understand its content. We propose adding methods of computer vision for the analysis of the page. When the HTML does not contain the needed object hierarchy information, one may use the visual information. Moreover, visual segmentation allows us to correct the HTML structure or to simplify its hierarchy which in many cases is not semantic. We perform top-down segmentation, yielding first the large scale layout of the page, down to the required degree of detail.

【预览】

附件列表
Files	Size	Format	View
RO201804100002658LZ	1435KB	PDF	download


Web Page Layout Via Visual Segmentation

Pnueli, Ayelet ; Bergman, Ruth ; Schein, Sagi ; Barkol, Omer
HP Development Company
关键词: Layout understanding; Layout analysis; Web page segmentation; HTML; DOM;
RP-ID : HPL-2009-160
学科分类：计算机科学（综合）
美国\|英语
来源: HP Labs
PDF


	文献评价指标
	下载次数：31次	浏览次数：27次

【 摘 要 】

【 预 览 】

【摘要】

【预览】