科技报告详细信息
Learning from Little: Comparison of Classifiers Given Little Training
Forman, George ; Cohen, Ira
HP Development Company
关键词: benchmark comparison;    text classification;    information retrieval;    F-measure;    precision in the top 10;    small training sets;    skewed/unbalanced class distribution;   
RP-ID  :  HPL-2004-19R1
学科分类:计算机科学(综合)
美国|英语
来源: HP Labs
PDF
【 摘 要 】

Many real-world machine learning tasks are faced with the problem of small training sets. Additionally, the class distribution of the training set often does not match the target distribution. In this paper we compare the performance of many learning models on a substantial benchmark of binary text classification tasks having small training sets. We vary the training size and class distribution to examine the learning surface, as opposed to the traditional learning curve. The models tested include various feature selection methods each coupled with four learning algorithms: Support Vector Machines (SVM), Logistic Regression, Naive Bayes, and Multinomial Naive Bayes. Different models excel in different regions of the learning surface, leading to meta-knowledge about which to apply in different situations. This helps guide the researcher and practitioner when facing choices of model and feature selection methods in, for example, information retrieval settings and others. Notes: Copyright Springer-Verlag. To be published in and presented at the 15th European Conference on Machine Learning and the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases, 20- 24 September 2004, Pisa, Italy 14 Pages

【 预 览 】
附件列表
Files Size Format View
RO201804100001023LZ 288KB PDF download
  文献评价指标  
  下载次数:15次 浏览次数:37次