会议论文详细信息
11th International Conference on Grammatical Inference
Bootstrapping and Learning PDFA in Data Streams
B. Balle bballe@lsi.upc.edu ; R. Gavalda` gavalda@lsi.upc.edu
PID  :  120778
来源: CEUR
PDF
【 摘 要 】

Markovian models with hidden state are widelyused formalisms for modeling sequential phenomena. Learnability of these models has been well studied when the sample is given in batch mode, and algorithms with PAClike learning guarantees exist for specific classes of models such as Probabilistic Deterministic Finite Automata (PDFA). Here we focus on PDFA and give an algorithm for infering models in this class under the stringent data stream scenario: unlike existing methods, our algorithm works incrementally and in one pass, uses memory sublinear in the stream length, and processes input items in amortized constant time. We provide rigorous PAClike bounds for all of the above, as well as an evaluation on synthetic data showing that the algorithm performs well in practice. Our algorithm makes a key usage of several old and new sketching techniques. In particular, we develop a new sketch for implementing bootstrapping in a streaming setting which may be of independent interest. In experiments we have observed that this sketch yields important reductions in the examples required for performing some crucial statistical tests in our algorithm.

【 预 览 】
附件列表
Files Size Format View
Bootstrapping and Learning PDFA in Data Streams 271KB PDF download
  文献评价指标  
  下载次数:9次 浏览次数:18次