会议论文详细信息
7th Workshop on Large-Scale Distributed Systems for Information Retrieval
Static Index Pruning for Information Retrieval Systems: APosting-Based Approach
Linh Thai Nguyen
Others  :  http://CEUR-WS.org/Vol-480/paper9.pdf
PID  :  11488
来源: CEUR
PDF
【 摘 要 】

Static index pruning methods have been proposed to reduce size of the inverted index of information retrieval systems. The goal is to increase efficiency (in terms of query response time) while preserving effectiveness (in terms of ranking quality). Current state-of-the-art approaches include the term-centric pruningapproach and the document-centric pruning approach. While theterm-centric pruning considers each inverted list independentlyand removes less important postings from each inverted list, thedocument-centric approach considers each documentindependently and removes less important terms from each document. In other words, the term-centric approach does not consider the relative importance of a posting in comparison withothers in the same document, and the document-centric approach does not consider the relative importance of a posting in comparison with others in the same inverted list. The consequence is less important postings are not pruned in some situations, and important postings are pruned in some other situations. We propose a posting-based pruning approach, which is ageneralization of both the term-centric and document-centric approaches. This approach ranks all postings and keeps only asubset of top ranked ones. The rank of a posting depends on several factors, such as its rank in its inverted list, its rank in itsdocument, its weighting score, the term weight and the documentweight. The effectiveness of our approach is verified by experiments using TREC queries and TREC datasets.

【 预 览 】
附件列表
Files Size Format View
Static Index Pruning for Information Retrieval Systems: APosting-Based Approach 155KB PDF download
  文献评价指标  
  下载次数:6次 浏览次数:9次