Distributed databases, which rely on redundant and distributed storage across multipleservers, are able to provide mission-critical data management services at large scale. Parallelismis the key to the scalability of distributed databases, but concurrent queries havingconflicts may block or abort each other when strong consistency is enforced using rigorousconcurrency control protocols. This thesis studies the techniques of building scalable distributeddatabases under strong consistency guarantees even in the face of high contentionworkloads. The techniques proposed in this thesis share a common idea, conflict mitigation,meaning mitigating conflicts by rescheduling operations in the concurrency control in the first place instead of resolving contending conflicts. Using this idea, concurrent queriesunder conflicts can be executed with high parallelism. This thesis explores this idea onboth databases that support serializable ACID (atomic, consistency, isolation, durability)transactions, and eventually consistent NoSQL systems.First, the epoch-based concurrency control (ECC) technique is proposed in ALOHA-KV,a new distributed key-value store that supports high performance read-only and write-onlydistributed transactions. ECC demonstrates that concurrent serializable distributedtransactions can be processed in parallel with low overhead even under high contention.With ECC, a new atomic commitment protocol is developed that only requires amortizedone round trip for a distributed write-only transaction to commit in the absence of failures.Second, a novel paradigm of serializable distributed transaction processing is developedto extend ECC with read-write transaction processing support. This paradigm uses anewly proposed database operator, functors, which is a placeholder for the value of a key,which can be computed asynchronously in parallel with other functor computations of thesame or other transactions. Functor-enabled ECC achieves more fine-grained concurrencycontrol than transaction level concurrency control, and it never aborts transactions dueto read-write or write-write conflicts but allows transactions to fail due to logic errors orconstraint violations while guaranteeing serializability.Lastly, this thesis explores consistency in the eventually consistent system, ApacheCassandra, for an investigation of the consistency violation, referred to as ;;consistencyspikes;;. This investigation shows that the consistency spikes exhibited by Cassandra arestrongly correlated with garbage collection, particularly the ;;stop-the-world;; phase in theJava virtual machine. Thus, delaying read operations arti cially at servers immediatelyafter a garbage collection pause can virtually eliminate these spikes.All together, these techniques allow distributed databases to provide scalable andconsistent storage service.
【 预 览 】
附件列表
Files
Size
Format
View
Building Scalable and Consistent Distributed Databases Under Conflicts