Distributed key-value stores power the backend of high-performance web services and cloud computing applications. Key-value stores such as Cassandra rely heavily on counters to keep track of the occurrences of various kinds of events. However, today's implementations of counters do not provide exactly-once semantics. A typical scenario is that a client requests a counter increment, times out waiting for a response, and creates a duplicate request, thus resulting in a double increment on the server side. In this thesis, we address this problem by presenting, analyzing, and evaluating a novel server-side data structure called the Forgetful Bloom Filter (FBF). Like a traditional Bloom filter, an FBF is a compact representation of a set of elements (e.g., client requests). However, an FBF is more powerful than a Bloom filter in two aspects: i) it can forget older elements (e.g., requests that are too old to apply), and ii) it is self-adapting under a varying workload. We also present an adaptive variant of FBF that adapts itself to meet a desired false positive rate -- thus the error achieved in the counter can be bounded even as the workload changes. We present experimental results from a prototype implementation of FBFs and discuss the implications for a key-value store such as Cassandra. Our results show that the FBF is highly accurate in maintaining correct counter values.
【 预 览 】
附件列表
Files
Size
Format
View
Idempotent distributed counters using a Forgetful Bloom Filter