期刊论文详细信息
Algorithms
An Algorithm to Compute the Character Access Count Distribution for Pattern Matching Algorithms
Tobias Marschall1 
[1] Centrum Wiskunde & Informatica (CWI), Science Park 123, 1098 XG Amsterdam, The Netherlands
关键词: pattern matching;    analysis of algorithms;    finite automaton;    minimization;    deterministic arithmetic automaton;    probabilistic arithmetic automaton;   
DOI  :  10.3390/a4040285
来源: mdpi
PDF
【 摘 要 】

We propose a framework for the exact probabilistic analysis of window-based pattern matching algorithms, such as Boyer–Moore, Horspool, Backward DAWG Matching, Backward Oracle Matching, and more. In particular, we develop an algorithm that efficiently computes the distribution of a pattern matching algorithm's running time cost (such as the number of text character accesses) for any given pattern in a random text model. Text models range from simple uniform models to higher-order Markov models or hidden Markov models (HMMs). Furthermore, we provide an algorithm to compute the exact distribution of differences in running time cost of two pattern matching algorithms. Methodologically, we use extensions of finite automata which we call deterministic arithmetic automata (DAAs) and probabilistic arithmetic automata (PAAs) [1]. Given an algorithm, a pattern, and a text model, a PAA is constructed from which the sought distributions can be derived using dynamic programming. To our knowledge, this is the first time that substring- or suffix-based pattern matching algorithms are analyzed exactly by computing the whole distribution of running time cost. Experimentally, we compare Horspool's algorithm, Backward DAWG Matching, and Backward Oracle Matching on prototypical patterns of short length and provide statistics on the size of minimal DAAs for these computations.

【 授权许可】

CC BY   
© 2011 by the authors; licensee MDPI, Basel, Switzerland.

【 预 览 】
附件列表
Files Size Format View
RO202003190047194ZK.pdf 520KB PDF download
  文献评价指标  
  下载次数:8次 浏览次数:13次