7th Workshop on Large-Scale Distributed Systems for Information Retrieval | |
Static Index Pruning for Information Retrieval Systems: APosting-Based Approach | |
Linh Thai Nguyen | |
Others : http://CEUR-WS.org/Vol-480/paper9.pdf PID : 11488 |
|
来源: CEUR | |
【 摘 要 】
Static index pruning methods have been proposed to reduce size of the inverted index of information retrieval systems. The goal is to increase efficiency (in terms of query response time) while preserving effectiveness (in terms of ranking quality). Current state-of-the-art approaches include the term-centric pruningapproach and the document-centric pruning approach. While theterm-centric pruning considers each inverted list independentlyand removes less important postings from each inverted list, thedocument-centric approach considers each documentindependently and removes less important terms from each document. In other words, the term-centric approach does not consider the relative importance of a posting in comparison withothers in the same document, and the document-centric approach does not consider the relative importance of a posting in comparison with others in the same inverted list. The consequence is less important postings are not pruned in some situations, and important postings are pruned in some other situations. We propose a posting-based pruning approach, which is ageneralization of both the term-centric and document-centric approaches. This approach ranks all postings and keeps only asubset of top ranked ones. The rank of a posting depends on several factors, such as its rank in its inverted list, its rank in itsdocument, its weighting score, the term weight and the documentweight. The effectiveness of our approach is verified by experiments using TREC queries and TREC datasets.
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
Static Index Pruning for Information Retrieval Systems: APosting-Based Approach | 155KB | download |