期刊论文详细信息
Journal of Computer Science
Proposing the new Algorithm and Technique Development for Integrating Web Table Extraction and Building a Mashup | Science Publications
Bagio Budiardjo1  Rudy A.G. Gultom1  Riri F. Sari1 
关键词: Web table extraction;    mashup stages;    recursive algorithm;    Document Object Model (DOM);    HTML format;    Integrated Development Environment (IDE);    data integration;   
DOI  :  10.3844/jcssp.2011.129.142
学科分类:计算机科学(综合)
来源: Science Publications
PDF
【 摘 要 】

Problem statement: Nowadays, various types of data in web table can be easily extractedfrom the Internet, although not all of web tables are relevant to the users. As we may know, most webpages are in unstructured HTML format, making web table extraction process very time consumingand costly. HTML format only focuses on the presentation, not based on the database system.Therefore, users need a tool in dealing with that process. Approach: This research proposed anapproach for implementing web table extraction and making a Mashup from HTML web pages usingXtractorz application. It is also discussed on how to collaborate and integrate a web table extractionprocess in the stage of building a Mashup, i.e., Data Retrieval, Data Source Modeling, Data Cleaning/Filtering, Data Integration and Data Visualization. The main issue lies in stage of data modelingcreation, in which Xtractorz must be able to automatically render Document Object Model (DOM) treein accordance to HTML tag or code of the web page from which the table is extracted. To overcomethat, the Xtractorz is equipped with algorithm and rules so it can enable to specifically analyze theHTML tags and to extract the data into a new table format. The algorithm is created by using recursivetechnique within a user-friendly GUI of Xtractorz. Results: The approach was evaluated byconducting experiment using Xtractorz and other similar applications, such as RoboMaker and Karma.The result of experiment showed that Xtractorz is more efficient in completing the experiment tasks,since Xtractorz has fewer steps to complete the whole tasks. Conclusion: Xtractorz can give a positivecontribution in terms of algorithm technique and a new approach method to web table extractionprocess and making a Mashup, where the core algorithm can extracts web data tables using recursivetechnique while rendering the DOM tree model automatically.

【 授权许可】

Unknown   

【 预 览 】
附件列表
Files Size Format View
RO201911300723709ZK.pdf 838KB PDF download
  文献评价指标  
  下载次数:7次 浏览次数:41次