期刊论文详细信息
Journal of Imaging
Efficient Query Specific DTW Distance for Document Retrieval with Unlimited Vocabulary
Viresh Ranjan1  C. V. Jawahar2  Gattigorla Nagendar2  Gaurav Harit3 
[1] CSE Department, Stony Brook University, Stony Brook, NY 11794, USA;Center for Visual Information Technology, IIIT Hyderabad, Hyderabad 500 032, India;Department of Computer Science and Engineering, IIT Jodhpur, Jodhpur 342037, India;
关键词: DTW distance;    query classifiers;    word spotting;    indexing;    retrieval;   
DOI  :  10.3390/jimaging4020037
来源: DOAJ
【 摘 要 】

In this paper, we improve the performance of the recently proposed Direct Query Classifier (dqc). The (dqc) is a classifier based retrieval method and in general, such methods have been shown to be superior to the OCR-based solutions for performing retrieval in many practical document image datasets. In (dqc), the classifiers are trained for a set of frequent queries and seamlessly extended for the rare and arbitrary queries. This extends the classifier based retrieval paradigm to an unlimited number of classes (words) present in a language. The (dqc) requires indexing cut-portions (n-grams) of the word image and dtw distance has been used for indexing. However, dtw is computationally slow and therefore limits the performance of the (dqc). We introduce query specific dtw distance, which enables effective computation of global principal alignments for novel queries. Since the proposed query specific dtw distance is a linear approximation of the dtw distance, it enhances the performance of the (dqc). Unlike previous approaches, the proposed query specific dtw distance uses both the class mean vectors and the query information for computing the global principal alignments for the query. Since the proposed method computes the global principal alignments using n-grams, it works well for both frequent and rare queries. We also use query expansion (qe) to further improve the performance of our query specific dtw. This also allows us to seamlessly adapt our solution to new fonts, styles and collections. We have demonstrated the utility of the proposed technique over 3 different datasets. The proposed query specific dtw performs well compared to the previous dtw approximations.

【 授权许可】

Unknown   

  文献评价指标  
  下载次数:0次 浏览次数:0次