科技报告详细信息
Tackling Concept Drift by Temporal Inductive Transfer
Forman, George
HP Development Company
关键词: text classification;    topic identification;    concept drift;    time series;    machine learning;    inductive transfer;    support vector machine;   
RP-ID  :  HPL-2006-20R1
学科分类:计算机科学(综合)
美国|英语
来源: HP Labs
PDF
【 摘 要 】

Machine learning is the mainstay for text classification. However, even the most successful techniques are defeated by many real-world applications that have a strong time-varying component. To advance research on this challenging but important problem, we promote a natural, experimental framework--the Daily Classification Task--which can be applied to large time-based datasets, such as the Reuters RCV1. In this paper we dissect concept drift into three main subtypes. We demonstrate via a novel visualization that the recurrent themes subtype is present in RCV1. This understanding led us to develop a new learning model that transfers induced knowledge through time to benefit future classifiers learning tasks. The method avoids two main problems with existing work in inductive transfer: scalability and the risk of negative transfer. In empirical tests, it consistently showed more than 10 points F-measure improvement for each of four Reuters categories tested. Notes: Copyright 2006 ACM. Published in and presented at SIGIR '06, 6-11 August 2006, Seattle, WA, USA9 Pages

【 预 览 】
附件列表
Files Size Format View
RO201804100001546LZ 676KB PDF download
  文献评价指标  
  下载次数:36次 浏览次数:76次