会议论文详细信息
18th Text REtrieval Conference
TREC Chemical IR Track 2009: A Distributed Dimensional Indexing Model forChemical Patent Search
图书情报档案学
Jay Urbain ; Ophir Frieder
PID  :  126194
学科分类:社会科学、人文和艺术(综合)
来源: CEUR
PDF
【 摘 要 】

For the TREC2009 Chemical IR Track, we exploredevelopment of a distributed information retrieval systembased on a dimensional data model. The indexing modelsupports named entity identification and aggregation ofterm statistics at multiple levels of patent structureincluding individual words, sentences, claims, descriptions,abstracts, and titles.The system was deployed across 15 Amazon Web Services(AWS) Elastic Cloud Compute (EC2) instances and 15Elastic Block Storage (EBS) database shards to supportefficient indexing and query processing of the relativelylarge index generated from indexing each individual word(sans stop words) in the 100G+ collection of chemicalpatent documents.The query processing algorithm for technology surveysearch and prior art search uses information extractiontechniques and locally aggregated term statistics to helpdisambiguate candidate entities and terms in context. Queryprocessing for prior art search automatically generates astructured query based on the relative distinctiveness ofindividual terms and candidate entity phrases from thequery patent's claims, abstract, and title sections. For boththe technology survey and prior art search, we evaluatedseveral probabilistic retrieval functions for integratingstatistics of retrieved named entities with term statistics atmultiple levels of document structure to identify relevant

【 预 览 】
附件列表
Files Size Format View
TREC Chemical IR Track 2009: A Distributed Dimensional Indexing Model forChemical Patent Search 73KB PDF download
  文献评价指标  
  下载次数:7次 浏览次数:23次