学位论文详细信息
The relationship between retrievability bias and retrieval performance
QA75 Electronic computers. Computer science
McLellan, Colin ; Engineering and Physical Sciences Research Council (EPSRC) ; Ounis, Iadh
University:University of Glasgow
Department:School of Computing Science
关键词: Information retrieval, search, bias, retrievability, performance.;   
Others  :  http://theses.gla.ac.uk/41080/1/2018McLellanPhD.pdf
来源: University of Glasgow
PDF
【 摘 要 】

A long standing problem in the domain of Information Retrieval (IR) has been the influence of biases within an IR system on the ranked results presented to a user. Retrievability is an IR evaluation measure which provides a means to assess the level of bias present in a system by evaluating how \emph{easily} documents in the collection can be found by the IR system in place. Retrievability is intrinsically related to retrieval performance because a document needs to be retrieved before it can be judged relevant. It is therefore reasonable to expect that lowering the level of bias present within a system could lead to improvements in retrieval performance. In this thesis, we undertake an investigation of the nature of the relationship between classical retrieval performance and retrievability bias. We explore the interplay between the two as we alter different aspects of the IR system in an attempt to investigate the \emph{Fairness Hypothesis}: that a system which is fairer (i.e. exerts the least amount of retrievability bias), performs better. To investigate the relationship between retrievability bias and retrieval performance we utilise a set of 6 standard TREC collections (3 news and 3 web) and a suite of standard retrieval models. We investigate this relationship by looking at four main aspects of the retrieval process using this set of TREC collections to also explore how generalisable the findings are. We begin by investigating how the retrieval model used relates to both bias and performance by issuing a large set of queries to a set of common retrieval models. We find a general trend where using a retrieval model that is evaluated to be more \emph{fair} (i.e. less biased) leads to improved performance over less fair systems.Hinting that providing documents with a more equal opportunity for access can lead to better retrieval performance. Following on from our first study, we investigate how bias and performance are affected by tuning length normalisation of several parameterised retrieval models. We explore the space of the length normalisation parameters of BM25, PL2 and Language Modelling. We find that tuning these parameters often leads to a trade off between performance and bias such that minimising bias will often not equate to maximising performance when traditional TREC performance measures are used. However, we find that measures which account for document length and users stopping strategies tend to evaluate the least biased settings to also be the maximum (or near maximum) performing parameter, indicating that the Fairness Hypothesis holds. Following this, we investigate the impact that query length has on retrievability bias. We issue various automatically generated query sets to the system to see if longer or shorter queries tend to influence the level of bias associated with the system. We find that longer queries tend to reduce bias, possibly due to the fact that longer queries will often lead to more documents being retrieved, but the reductions in bias are in diminishing returns. Our studies show that after issuing two terms, each additional term reduces bias by significantly less. Finally, we build on our work by employing some fielded retrieval models. We look at typical fielding, where the field relevance scores are computed individually then combined, and compare it with an enhanced version of fielding, where fields are weighted and combined then scored. We see that there are inherent biases against particular documents in the former model, especially in cases where a field is empty and as such see the latter tends to both perform better and also lower bias when compared with the former.In this thesis, we have examined several different ways in which performance and bias can be related. We conclude that while the Fairness Hypothesis has its merits, it is not a universally applicable idea. We further add to this by noting that the method used to compute bias does not distinguish between positive and negative biases and this influences our results. We do however support the idea that reducing the bias of a system by eliminating biases that are known to be negative should result in improvements in system performance.

【 预 览 】
附件列表
Files Size Format View
The relationship between retrievability bias and retrieval performance 3503KB PDF download
  文献评价指标  
  下载次数:16次 浏览次数:27次