期刊论文详细信息
BMC Bioinformatics
Development and tuning of an original search engine for patent libraries in medicinal chemistry
Research
Patrick Ruch1  Julien Gobeill1  Christian Lovis2  Emilie Pasche3  Fatma Oezdemir-Zaech4  Olivier Kreim4  Therese Vachon4 
[1] Bibliomics and Text-Mining Group (BiTeM), Information Science Department, University of Applied Sciences, Route de la Drize 7, 1227, Carouge, Switzerland;Swiss Institute of Bioinformatics (SIB), Rue Michel Servet 1, 1211, Geneva 4, Switzerland;Division of Medical Information Sciences (SIMED), University Hospitals of Geneva and University of Geneva, Rue Gabrielle-Perret-Gentil 4, 1211, Geneva 14, Switzerland;Division of Medical Information Sciences (SIMED), University Hospitals of Geneva and University of Geneva, Rue Gabrielle-Perret-Gentil 4, 1211, Geneva 14, Switzerland;Bibliomics and Text-Mining Group (BiTeM), Information Science Department, University of Applied Sciences, Route de la Drize 7, 1227, Carouge, Switzerland;Swiss Institute of Bioinformatics (SIB), Rue Michel Servet 1, 1211, Geneva 4, Switzerland;Novartis Institute for BioMedical Research - Text Mining Services (NIBR-IT/TMS), Novartis Pharma AG, 4002, Postfach, Basel, Switzerland;
关键词: Search Engine;    Search Task;    Citation Network;    Relevance Judgment;    International Patent Classification;   
DOI  :  10.1186/1471-2105-15-S1-S15
来源: Springer
PDF
【 摘 要 】

BackgroundThe large increase in the size of patent collections has led to the need of efficient search strategies. But the development of advanced text-mining applications dedicated to patents of the biomedical field remains rare, in particular to address the needs of the pharmaceutical & biotech industry, which intensively uses patent libraries for competitive intelligence and drug development.MethodsWe describe here the development of an advanced retrieval engine to search information in patent collections in the field of medicinal chemistry. We investigate and combine different strategies and evaluate their respective impact on the performance of the search engine applied to various search tasks, which covers the putatively most frequent search behaviours of intellectual property officers in medical chemistry: 1) a prior art search task; 2) a technical survey task; and 3) a variant of the technical survey task, sometimes called known-item search task, where a single patent is targeted.ResultsThe optimal tuning of our engine resulted in a top-precision of 6.76% for the prior art search task, 23.28% for the technical survey task and 46.02% for the variant of the technical survey task. We observed that co-citation boosting was an appropriate strategy to improve prior art search tasks, while IPC classification of queries was improving retrieval effectiveness for technical survey tasks. Surprisingly, the use of the full body of the patent was always detrimental for search effectiveness. It was also observed that normalizing biomedical entities using curated dictionaries had simply no impact on the search tasks we evaluate. The search engine was finally implemented as a web-application within Novartis Pharma. The application is briefly described in the report.ConclusionsWe have presented the development of a search engine dedicated to patent search, based on state of the art methods applied to patent corpora. We have shown that a proper tuning of the system to adapt to the various search tasks clearly increases the effectiveness of the system. We conclude that different search tasks demand different information retrieval engines' settings in order to yield optimal end-user retrieval.

【 授权许可】

CC BY   
© Pasche et al.; licensee BioMed Central Ltd. 2014

【 预 览 】
附件列表
Files Size Format View
RO202311092536105ZK.pdf 1184KB PDF download
【 参考文献 】
  • [1]
  • [2]
  • [3]
  • [4]
  • [5]
  • [6]
  • [7]
  • [8]
  • [9]
  • [10]
  • [11]
  • [12]
  • [13]
  • [14]
  • [15]
  • [16]
  • [17]
  • [18]
  • [19]
  • [20]
  • [21]
  • [22]
  • [23]
  • [24]
  • [25]
  • [26]
  • [27]
  • [28]
  • [29]
  • [30]
  • [31]
  • [32]
  • [33]
  • [34]
  • [35]
  • [36]
  • [37]
  • [38]
  文献评价指标  
  下载次数:4次 浏览次数:0次