Computer Science and Information Systems | |
Research on Discovering Deep Web Entries | |
Ying Wang1  Wanli Zuo2  Fengling He3  Huilai Li4  | |
[1] College of Computer Science and Technology, Jilin University;College of Mathematics, Jilin University;College of Software, Changchun Institute of Technology,;Key Laboratory of Computation and Knowledge Engineering, | |
关键词: Deep Web; ontology; WPC; FSC; FCC; | |
DOI : 10.2298/CSIS100322028W | |
学科分类:社会科学、人文和艺术(综合) | |
来源: Computer Science and Information Systems | |
【 摘 要 】
Ontology plays an important role in locating Domain-Specific Deep Web contents, therefore, this paper presents a novel framework WFF for efficiently locating Domain-Specific Deep Web databases based on focused crawling and ontology by constructing Web Page Classifier(WPC), Form Structure Classifier(FSC) and Form Content Classifier(FCC) in a hierarchical fashion. Firstly, WPC discovers potentially interesting pages based on ontology-assisted focused crawler. Then, FSC analyzes the interesting pages and determines whether these pages subsume searchable forms based on structural characteristics. Lastly, FCC identifies searchable forms that belong to a given domain in the semantic level, and stores these URLs of Domain-Specific searchable forms to a database. Through a detailed experimental evaluation, WFF framework not only simplifies discovering process, but also effectively determines Domain-Specific databases.
【 授权许可】
CC BY-NC-ND
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
RO201904021556470ZK.pdf | 460KB | download |