科技报告详细信息
OfCourse: Web Content Discovery, Classification and Information Extraction for Online
Xiong, Yuhong ; Luo, Ping ; Zhao, Yong ; Lin, Fen ; Feng, Shicong ; Zhou, Baoyao ; Zheng. Liwei
HP Development Company
关键词: Vertical search;    online courses;    Web classification;    Web in- formation extraction;   
RP-ID  :  HPL-2010-159
学科分类:计算机科学(综合)
美国|英语
来源: HP Labs
PDF
【 摘 要 】

In this paper we present OfCourse, a vertical search engine for online course materials. These materials have the following characteristics: they are scattered very sparsely in the university Web sites; and are generated by the teachers with totally different HMTL templates and layouts. These characteristics impose some challenges for Web Classification (to identify the course materials) and Web Information Extraction (to extract course metadata, such as course title, time and ID) from the identified course homepages. Here, we describe our proposed method to tackle these challenges, and the features of this system. OfCourse, containing over 60,000 courses from the top 50 universities in the US, is currently available for public access.

【 预 览 】
附件列表
Files Size Format View
RO201804100002706LZ 984KB PDF download
  文献评价指标  
  下载次数:26次 浏览次数:20次