科技报告

【摘要】

Most supervised machine learning research assumes the training set is a random sample from the target population, thus the class distribution is invariant. In real world situations, however, the class distribution changes, and is known to erode the effectiveness of classifiers and calibrated probability estimators. This paper focuses on the problem of accurately estimating the number of positives in the test set--quantification--as opposed to classifying individual cases accurately. It compares three methods: classify & count, an adjusted variant, and a mixture model. An empirical evaluation on a text classification benchmark reveals that the simple method is consistently biased, and that the mixture model is surprisingly effective even when positives are very scarce in the training set--a common case in information retrieval. Notes: Copyright 2005 Springer-Verlag. Published in and presented at the 16th European Conference on Machine Learning (ECML'05), 3-7 October 2005, Porto, Portugal http://ecmlpkdd05.liacc.up.pt/ 12 Pages

【预览】

附件列表
Files	Size	Format	View
RO201804100001287LZ	319KB	PDF	download


Counting Positives Accurately Despite Inaccurate Classification

Forman, George
HP Development Company
关键词: supervised machine learning; estimation; mixture models; shifting class prior; non-stationary class distribution;
RP-ID : HPL-2005-96R1
学科分类：计算机科学（综合）
美国\|英语
来源: HP Labs
PDF


	文献评价指标
	下载次数：28次	浏览次数：45次

【 摘 要 】

【 预 览 】

【摘要】

【预览】