Bulletin of the Polish Academy of Sciences. Technical Sciences | |
Compact and hash based variants of the suffix array | |
M. RaniszewskiInstitute of Applied Computer Science, Lodz University of Technology, 11 Politechniki Ave., 90-924 ?ód?, PolandOther articles by this author:De Gruyter OnlineGoogle Scholar1  S. GrabowskiCorresponding authorInstitute of Applied Computer Science, Lodz University of Technology, 11 Politechniki Ave., 90-924 ?ód?, PolandEmailOther articles by this author:De Gruyter OnlineGoogle Scholar1  | |
[1] Institute of Applied Computer Science, Lodz University of Technology, 11 Politechniki Ave., 90-924 ?ód?, Poland | |
关键词: Keywords: string matching; full-text indexing; suffix array; compact indexes; hashing; | |
DOI : 10.1515/bpasts-2017-0046 | |
学科分类:工程和技术(综合) | |
来源: Polska Akademia Nauk * Centrum Upowszechniania Nauki / Polish Academy of Sciences, Center for the Advancement of Science | |
【 摘 要 】
Full-text indexing aims at building a data structure over a given text capable of efficiently finding arbitrary text patterns, and possibly requiring little space. We propose two suffix array inspired full-text indexes. One, called SA-hash, augments the suffix array with a hash table to speed up pattern searches due to significantly narrowed search interval before the binary search phase. The other, called FBCSA, is a compact data structure, similar to Mäkinen’s compact suffix array (MakCSA), but working on fixed size blocks. Experiments on the widely used Pizza & Chili datasets show that SA-hash is about 2–3 times faster in pattern searches (counts) than the standard suffix array, for the price of requiring 0.2n–1.1n bytes of extra space, where n is the text length. FBCSA, in one of the presented variants, reduces the suffix array size by a factor of about 1.5–2, while it gets close in search times, winning in speed with its competitors known from the literature, MakCSA and LCSA.
【 授权许可】
Unknown
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
RO201902180023634ZK.pdf | 2275KB | download |