学位论文详细信息
Evaluating Information Retrieval Systems With Multiple Non-Expert Assessors
Computer Science;Information Retrieval;Machine Learning;Crowdsourcing;Computer Science
Li, Le
University of Waterloo
关键词: Computer Science;    Information Retrieval;    Machine Learning;    Crowdsourcing;    Computer Science;   
Others  :  https://uwspace.uwaterloo.ca/bitstream/10012/7713/1/Li_Le.pdf
瑞士|英语
来源: UWSPACE Waterloo Institutional Repository
PDF
【 摘 要 】

Many current test collections require the use of expert judgments during construction. The true label of each document is given by an expert assessor. However, the cost and effort associated with expert training and judging are typically quite high in the event where we have a high number of documents to judge. One way to address this issue is to have each document judged by multiple non-expert assessors at a lower expense. However, there are two key factors that can make this method difficult: the variability across assessors;; judging abilities, and the aggregation of the noisy labels into one single consensus label. Much previous work has shown how to utilize this method to replace expert labels in the relevance evaluation. However, the effects of relevance judgment errors on the ranking system evaluation have been less explored.This thesis mainly investigates how to best evaluate information retrieval systems with noisy labels, where no ground-truth labels are provided, and where each document may receive multiple noisy labels. Based on our simulation results on two datasets, we find that conservative assessors that tend to label incoming documents as non-relevant are preferable. And there are two important factors affect the overall conservativeness of the consensus labels: the assessor;;s conservativeness and the relevance standard. This important observation essentially provides a guideline on what kind of consensus algorithms or assessors are needed in order to preserve the high correlation with expert labels in ranking system evaluation. Also, we systematically investigate how to find the consensus labels for those documents with equal confidence to be either relevant or non-relevant. We investigate a content-based consensus algorithm which links the noisy labels with document content. We compare it against the state-of-art consensus algorithms, and find that, depending on the document collection, this content-based approach may help or hurt the performance.

【 预 览 】
附件列表
Files Size Format View
Evaluating Information Retrieval Systems With Multiple Non-Expert Assessors 2010KB PDF download
  文献评价指标  
  下载次数:14次 浏览次数:44次