Kuwait Journal of Science
Trade-off between the number of index-terms and the information retrieval system’s performance
关键词: High-frequency;    index term;    information retrieval;    low-frequency;    term frequency.;   
DOI  :  
来源: Kuwait University * Academic Publication Council
【 摘 要 】
Performance of modern day information retrieval (IR) systems depends on the index terms and their occurrence frequency. Hence, a small variation in the frequency of index terms alters the performance of IR systems. This article analyzes the variation in performance of IR systems due to changes in the frequency of index terms. Based on the occurrence frequency, we classified the index terms as `Low’ and `High’ frequency terms; their performances were also recorded. Low-frequency terms tend to decrease the performance of IR systems. In contrast, the performance of highfrequency terms is better than its counterpart. High-frequency terms do 10% performance improvement in comparison with the low-frequency terms. By deleting the low-frequency index terms, we can save up to 65% of index terms with a maximum of 26% degradation in performance of IR systems.
【 授权许可】


【 预 览 】
Files Size Format View
RO201902020791846ZK.pdf 916KB PDF download
  下载次数:8次 浏览次数:12次