会议论文详细信息
IEEE ICDM Workshop on Frequent Itemset Mining Implementations
WebDocs: a real-life huge transactional dataset
Claudio Lucchese ; Salvatore Orlando ; Raffaele Perego ; Fabrizio Silvestri
Others  :  http://CEUR-WS.org/Vol-126/webdocs.pdf
PID  :  1767
来源: CEUR
PDF
【 摘 要 】

This short note describes the main characteristics of WebDocs, a huge real-life transactional dataset wemade publicly available to the Data Mining community through the FIMI repository. We built WebDocsfrom a spidered collection of web html documents. The whole collection contains about 1.7 millions documents, mainly written in English, and its size is about 5GB. [first paragragh]

【 预 览 】
附件列表
Files Size Format View
WebDocs: a real-life huge transactional dataset 858KB PDF download
  文献评价指标  
  下载次数:10次 浏览次数:15次