学位论文详细信息
Elasticity and resource aware scheduling in distributed data stream processing systems
Elasticity;Resource Aware Scheduling;Storm;Distributed Data Stream Processing
Peng, Boyang
关键词: Elasticity;    Resource Aware Scheduling;    Storm;    Distributed Data Stream Processing;   
Others  :  https://www.ideals.illinois.edu/bitstream/handle/2142/78453/PENG-THESIS-2015.pdf?sequence=1&isAllowed=y
美国|英语
来源: The Illinois Digital Environment for Access to Learning and Scholarship
PDF
【 摘 要 】
The era of big data has led to the emergence of new systems for real-time distributed stream processing, e.g., Apache Storm is one of the most popular stream processing systems in industry today. However, Storm, like many other stream processing systems, lacks many important and desired features. One important feature is elasticity with clusters running Storm, i.e. change the cluster size on demand.Since the current Storm scheduler uses a naïve round robin approach in scheduling applications, another important feature is for Storm to have an intelligent scheduler that efficiently uses the underlying hardware by taking into account resource demand and resource availability when performing a scheduling.Both are important features that can make Storm a more robust and efficient system.Even though our target system is Storm, the techniques we have developed can be used in other similar stream processing systems.We have created a system called Stela that we implemented in Storm, which can perform on-demand scale-out and scale-in operations in distributed processing systems. Stela is minimally intrusive and disruptive for running jobs.Stela maximizes performance improvement for scale-out operations and minimally decrease performance for scale-in operations while not changing existing scheduling of jobs.Stela was developed in partnership with another Master’s Student, Le Xu [1].We have created a system called R-Storm that does intelligent resource aware scheduling within Storm.The default round-robin scheduling mechanism currently deployed in Storm disregards resource demands and availability, and can therefore be very inefficient at times. R-Storm is designed to maximize resource utilization while minimizing network latency.When scheduling tasks, R-Storm can satisfy both soft and hard resource constraints as well as minimizing network distance between components that communicate with each other.The problem of mapping tasks to machines can be reduced to Quadratic Multiple 3-Dimensional Knapsack Problem, which is an NP-hard problem.However, our proposed scheduling algorithm within R-Storm attempts to bypass the limitation associated with NP-hard class of problems.We evaluate the performance of both Stela and R-Storm through our implementations of them in Storm by using several micro-benchmark Storm topologies and Storm topologies in use by Yahoo! In.Our experiments show that compared to Apache Storm’s default scheduler, Stela’s scale-out operation reduces interruption time to as low as 12.5% and achieves throughput that is 45-120% higher than Storm’s.And for scale-in operations, Stela achieves almost zero throughput post scale reduction while two other groups experience 200% and 50% throughput decrease respectively.For R-Storm, we observed that schedulings of topologies done by R-Storm perform on average 50%-100% better than that done by Storm’s default scheduler.
【 预 览 】
附件列表
Files Size Format View
Elasticity and resource aware scheduling in distributed data stream processing systems 2403KB PDF download
  文献评价指标  
  下载次数:4次 浏览次数:31次