会议论文详细信息
7th Workshop on Large-Scale Distributed Systems for Information Retrieval
Comparing Distributed Indexing: To MapReduce or Not?
Richard M. C. McCreadie ; Craig Macdonald ; Iadh Ounis
Others  :  http://CEUR-WS.org/Vol-480/paper5.pdf
PID  :  11476
来源: CEUR
PDF
【 摘 要 】

Information Retrieval (IR) systems require input corpora to be indexed. The advent of terabyte-scale Web corpora has reinvigorated the need for efficient indexing. In this work, we investigate distributed indexing paradigms, in particular within the auspices of the MapReduce programming frame- work. In particular, we describe two indexing approaches based on the original MapReduce paper, and compare these with a standard distributed IR system, the MapReduce indexing strategy used by the Nutch IR platform, and a more advanced MapReduce indexing implementation that we propose. Experiments using the Hadoop MapReduce implementation and a large standard TREC corpus show our proposed MapReduce indexing implementation to be more efficient than those proposed in the original paper.

【 预 览 】
附件列表
Files Size Format View
Comparing Distributed Indexing: To MapReduce or Not? 536KB PDF download
  文献评价指标  
  下载次数:4次 浏览次数:4次