7th Workshop on Large-Scale Distributed Systems for Information Retrieval | |
Comparing Distributed Indexing: To MapReduce or Not? | |
Richard M. C. McCreadie ; Craig Macdonald ; Iadh Ounis | |
Others : http://CEUR-WS.org/Vol-480/paper5.pdf PID : 11476 |
|
来源: CEUR | |
【 摘 要 】
Information Retrieval (IR) systems require input corpora to be indexed. The advent of terabyte-scale Web corpora has reinvigorated the need for efficient indexing. In this work, we investigate distributed indexing paradigms, in particular within the auspices of the MapReduce programming frame- work. In particular, we describe two indexing approaches based on the original MapReduce paper, and compare these with a standard distributed IR system, the MapReduce indexing strategy used by the Nutch IR platform, and a more advanced MapReduce indexing implementation that we propose. Experiments using the Hadoop MapReduce implementation and a large standard TREC corpus show our proposed MapReduce indexing implementation to be more efficient than those proposed in the original paper.
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
Comparing Distributed Indexing: To MapReduce or Not? | 536KB | download |