科技报告详细信息
Adventures in Feature Selection on an Industrial Dataset... and Ensuing | |
Forman, George | |
HP Development Company | |
关键词: text feature selection; text classification; document categorization; lessons learned; | |
RP-ID : HPL-2012-161R1 | |
学科分类:计算机科学(综合) | |
美国|英语 | |
来源: HP Labs | |
![]() |
【 摘 要 】
We relate the story of an interesting failure of text feature selection methods on an industrial dataset of technical documents. Our detailed dissection and ultimate understanding of the failure led to the creation of general solutions that not only solved the robustness problem we faced, but were also able to improve classification accuracy for simpler, public datasets, which was crucial to enable the works' publishability.
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
RO201804100000164LZ | 628KB | ![]() |