学位论文详细信息
Spam elimination and bias correction : ensuring label quality in crowdsourced tasks.
crowdsourcing;bias;spam;complex tasks;worker filtering
Lingyu Lyu
University:University of Louisville
Department:Computer Engineering and Computer Science
关键词: crowdsourcing;    bias;    spam;    complex tasks;    worker filtering;   
Others  :  https://ir.library.louisville.edu/cgi/viewcontent.cgi?article=4173&context=etd
美国|英语
来源: The Universite of Louisville's Institutional Repository
PDF
【 摘 要 】

Crowdsourcing is proposed as a powerful mechanism for accomplishing large scale tasks via anonymous workers online. It has been demonstrated as an effective and important approach for collecting labeled data in application domains which require human intelligence, such as image labeling, video annotation, natural language processing, etc. Despite the promises, one big challenge still exists in crowdsourcing systems: the difficulty of controlling the quality of crowds. The workers usually have diverse education levels, personal preferences, and motivations, leading to unknown work performance while completing a crowdsourced task. Among them, some are reliable, and some might provide noisy feedback. It is intrinsic to apply worker filtering approach to crowdsourcing applications, which recognizes and tackles noisy workers, in order to obtain high-quality labels. The presented work in this dissertation provides discussions in this area of research, and proposes efficient probabilistic based worker filtering models to distinguish varied types of poor quality workers. Most of the existing work in literature in the field of worker filtering either only concentrates on binary labeling tasks, or fails to separate the low quality workers whose label errors can be corrected from the other spam workers (with label errors which cannot be corrected). As such, we first propose a Spam Removing and De-biasing Framework (SRDF), to deal with the worker filtering procedure in labeling tasks with numerical label scales. The developed framework can detect spam workers and biased workers separately. The biased workers are defined as those who show tendencies of providing higher (or lower) labels than truths, and their errors are able to be corrected. To tackle the biasing problem, an iterative bias detection approach is introduced to recognize the biased workers. The spam filtering algorithm proposes to eliminate three types of spam workers, including random spammers who provide random labels, uniform spammers who give same labels for most of the items, and sloppy workers who offer low accuracy labels. Integrating the spam filtering and bias detection approaches into aggregating algorithms, which infer truths from labels obtained from crowds, can lead to high quality consensus results. The common characteristic of random spammers and uniform spammers is that they provide useless feedback without making efforts for a labeling task. Thus, it is not necessary to distinguish them separately. In addition, the removal of sloppy workers has great impact on the detection of biased workers, with the SRDF framework. To combat these problems, a different way of worker

【 预 览 】
附件列表
Files Size Format View
Spam elimination and bias correction : ensuring label quality in crowdsourced tasks. 5066KB PDF download
  文献评价指标  
  下载次数:7次 浏览次数:10次