科技报告详细信息
Tech Report: HPL-2000-5:Model-Independent Measure
Zhang, Bin ; Elkan, Charles ; Dayal, Umeshwar ; Hsu, Meichun
HP Development Company
关键词: data mining;    machine learning;    model fitting;    regression;    exploratory data analysis;   
RP-ID  :  HPL-2000-5
学科分类:计算机科学(综合)
美国|英语
来源: HP Labs
PDF
【 摘 要 】

We prove an inequality bound for the variance of the error of a regression function plus itsnon-smoothness as quantified by the Uniform Lipschitz condition. Thecoefficients in the inequality are calculated based ontraining data with no assumptions about how the regression function is learned. This inequality, called the Unpredictability Inequality, allows us to evaluate the difficulty of the regression problem fora given dataset, before applying any regression method. The Inequality gives information on the tradeoff between prediction error and how sensitive predictions must be to predictor values. The Unpredictability Inequality can be applied to any convex subregion of the space X of predictors. We improve the effectiveness of the Inequality by partitioning X into multiple convex subregions via clustering, and then applying the Inequality on each subregion. Experimental results on genuine data froma manufacturing line show that, combined with clustering, the Unpredictability Inequality provides considerable insight and help in selecting a regression method. 19 Pages

【 预 览 】
附件列表
Files Size Format View
RO201804100002447LZ 156KB PDF download
  文献评价指标  
  下载次数:38次 浏览次数:58次