会议论文

【摘要】

Contextual bandit algorithms have become popular tools in online recommendation and advertising systems. Offline evaluation of the effectiveness of new algorithms in these applications is critical for protecting online user experiences but very challenging due to their “partiallabel” nature. A common practice is to create a simulator which simulates the online environment for the problem at hand and then run an algorithm against this simulator. However, creating the simulator itself is often difficult and modeling bias is usually unavoidably introduced. The purpose of this paper is twofold. First, we review a recently proposed offline evaluation technique. Different from simulatorbased approaches, the method is completely datadriven, is easy to adapt to different applications, and more importantly, provides provably unbiased evalu ations. We argue for the wide use of this technique as standard practice when comparing bandit algorithms in reallife problems. Second, as an application of this technique, we compare and validate a number of new algorithms based on generalized linear models. Experiments using real Yahoo! data suggest substantial improvement over algorithms with linear models when the rewards are binary.

【预览】

附件列表
Files	Size	Format	View
An Unbiased Offline Evaluation of Contextual Bandit Algorithms with Generalized Linear Models	350KB	PDF	download

Workshop on On-line Trading of Exploration and Exploitation 2
An Unbiased Offline Evaluation of Contextual Bandit Algorithms with Generalized Linear Models

Lihong Li LIHONG@YAHOO-INC.COM ; Microsoft ; Yahoo! Research ; Yahoo! Labs ; Yahoo! Labs
PID : 120706

来源: CEUR
PDF


	文献评价指标
	下载次数：17次	浏览次数：17次

【 摘 要 】

【 预 览 】

【摘要】

【预览】