会议论文详细信息
IEEE ICDM Workshop on Frequent Itemset Mining Implementations | |
WebDocs: a real-life huge transactional dataset | |
Claudio Lucchese ; Salvatore Orlando ; Raffaele Perego ; Fabrizio Silvestri | |
Others : http://CEUR-WS.org/Vol-126/webdocs.pdf PID : 1767 |
|
来源: CEUR | |
【 摘 要 】
This short note describes the main characteristics of WebDocs, a huge real-life transactional dataset wemade publicly available to the Data Mining community through the FIMI repository. We built WebDocsfrom a spidered collection of web html documents. The whole collection contains about 1.7 millions documents, mainly written in English, and its size is about 5GB. [first paragragh]
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
WebDocs: a real-life huge transactional dataset | 858KB | download |