学位论文详细信息
Math Information Retrieval using a Text Search Engine
Mathematics information retrieval;MIR;Mathematical content representation;MathML;Okapi BM25;Lucene
Dallas, Fraseradvisor:Frank, Tompa ; affiliation1:Faculty of Mathematics ; Frank, Tompa ;
University of Waterloo
关键词: Mathematical content representation;    Master Thesis;    Mathematics information retrieval;    MathML;    MIR;    Lucene;    Okapi BM25;   
Others  :  https://uwspace.uwaterloo.ca/bitstream/10012/13329/3/Fraser_Dallas.pdf
瑞士|英语
来源: UWSPACE Waterloo Institutional Repository
PDF
【 摘 要 】

Combining text and mathematics when searching in a corpus with extensive mathematicalnotation remains an open problem. Recent results for math information retrieval systemson the math and text retrieval task at NTCIR-12, for example, show room for improvement,even though formula retrieval appears to be fairly successful.This thesis explores how to adapt the state-of-the-art BM25 text ranking method towork well when searching for math and text together. Symbol layout trees are used torepresent math formulas, and features are extracted from the trees, which are then usedas search terms for BM25. This thesis explores various features of symbol layout trees andexplores their effects on retrieval performance. Based on the results, a set of features arerecommended that can be used effectively in a conventional text-based retrieval engine.The feature set is validated using various NTCIR math only benchmarks.Various proximity measures show math and text are closer in documents deemed rel-evant than documents deemed non-relevant for NTCIR queries. Therefore it would seemthat proximity could improve ranking for math information retrieval systems when search-ing for both math and text. Nevertheless, two attempts to include proximity when scoringmatches were unsuccessful in improving retrieval effectiveness.Finally, the BM25 ranking of both math and text using the feature set designed forformula retrieval is validated by various NTCIR math and text benchmarks.

【 预 览 】
附件列表
Files Size Format View
Math Information Retrieval using a Text Search Engine 657KB PDF download
  文献评价指标  
  下载次数:34次 浏览次数:30次