期刊论文详细信息
Bulletin of the Polish Academy of Sciences. Technical Sciences
Compact and hash based variants of the suffix array
M. RaniszewskiInstitute of Applied Computer Science, Lodz University of Technology, 11 Politechniki Ave., 90-924 ?ód?, PolandOther articles by this author:De Gruyter OnlineGoogle Scholar1  S. GrabowskiCorresponding authorInstitute of Applied Computer Science, Lodz University of Technology, 11 Politechniki Ave., 90-924 ?ód?, PolandEmailOther articles by this author:De Gruyter OnlineGoogle Scholar1 
[1] Institute of Applied Computer Science, Lodz University of Technology, 11 Politechniki Ave., 90-924 ?ód?, Poland
关键词: Keywords: string matching;    full-text indexing;    suffix array;    compact indexes;    hashing;   
DOI  :  10.1515/bpasts-2017-0046
学科分类:工程和技术(综合)
来源: Polska Akademia Nauk * Centrum Upowszechniania Nauki / Polish Academy of Sciences, Center for the Advancement of Science
PDF
【 摘 要 】

Full-text indexing aims at building a data structure over a given text capable of efficiently finding arbitrary text patterns, and possibly requiring little space. We propose two suffix array inspired full-text indexes. One, called SA-hash, augments the suffix array with a hash table to speed up pattern searches due to significantly narrowed search interval before the binary search phase. The other, called FBCSA, is a compact data structure, similar to Mäkinen’s compact suffix array (MakCSA), but working on fixed size blocks. Experiments on the widely used Pizza & Chili datasets show that SA-hash is about 2–3 times faster in pattern searches (counts) than the standard suffix array, for the price of requiring 0.2n–1.1n bytes of extra space, where n is the text length. FBCSA, in one of the presented variants, reduces the suffix array size by a factor of about 1.5–2, while it gets close in search times, winning in speed with its competitors known from the literature, MakCSA and LCSA.

【 授权许可】

Unknown   

【 预 览 】
附件列表
Files Size Format View
RO201902180023634ZK.pdf 2275KB PDF download
  文献评价指标  
  下载次数:6次 浏览次数:14次