科技报告详细信息
Quantifying Trends Accurately Despite Classifier Error and Class Imbalance
Forman, George
HP Development Company
关键词: classification;    quantification;    cost quantification;    text mining;   
RP-ID  :  HPL-2006-48R1
学科分类:计算机科学(综合)
美国|英语
来源: HP Labs
PDF
【 摘 要 】
This paper promotes a new task for supervised machine learning research: quantification--the pursuit of learning methods for accurately estimating the class distribution of a test set, with no concern for predictions on individual cases. A variant for cost quantification addresses the need to total up costs according to categories predicted by imperfect classifiers. These tasks cover a large and important family of applications that measure trends over time. The paper establishes a research methodology, and uses it to evaluate several proposed methods that involve selecting the classification threshold in a way that would spoil the accuracy of individual classifications. In empirical tests, Median Sweep methods show outstanding ability to estimate the class distribution, despite wide disparity in testing and training conditions. The paper addresses shifting class priors and costs, but not concept drift in general.
【 预 览 】
附件列表
Files Size Format View
RO201804100001538LZ 278KB PDF download
  文献评价指标  
  下载次数:7次 浏览次数:26次