科技报告

【摘要】

Many real-world machine learning tasks are faced with the problem of small training sets. Additionally, the class distribution of the training set often does not match the target distribution. In this paper we compare the performance of many learning models on a substantial benchmark of binary text classification tasks having small training sets. We vary the training size and class distribution to examine the learning surface, as opposed to the traditional learning curve. The models tested include various feature selection methods each coupled with four learning algorithms: Support Vector Machines (SVM), Logistic Regression, Naive Bayes, and Multinomial Naive Bayes. Different models excel in different regions of the learning surface, leading to meta-knowledge about which to apply in different situations. This helps guide the researcher and practitioner when facing choices of model and feature selection methods in, for example, information retrieval settings and others. Notes: Copyright Springer-Verlag. To be published in and presented at the 15th European Conference on Machine Learning and the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases, 20- 24 September 2004, Pisa, Italy 14 Pages

【预览】

附件列表
Files	Size	Format	View
RO201804100001023LZ	288KB	PDF	download


Learning from Little: Comparison of Classifiers Given Little Training

Forman, George ; Cohen, Ira
HP Development Company
关键词: benchmark comparison; text classification; information retrieval; F-measure; precision in the top 10; small training sets; skewed/unbalanced class distribution;
RP-ID : HPL-2004-19R1
学科分类：计算机科学（综合）
美国\|英语
来源: HP Labs
PDF


	文献评价指标
	下载次数：15次	浏览次数：37次

【 摘 要 】

【 预 览 】

【摘要】

【预览】