Increasingly, large systems and data centers are being built in a 'scale out' manner, i.e. using large numbers of commodity hardware components, instead of traditional 'scale up' using expensive, specialized equipment. However, large numbers of commodity components imply higher rates of failure across such systems. Such failures can cause applications to miss their deadlines for task completion. For this reason, cloud service providers and cloud applications must anticipate failures and engineer their services accordingly. In this thesis, we first analyze the availability of a commodity data center designed for MapReduce applications. MapReduce is increasingly used in industry for efficient large scale data processing tasks including personal advertising, spam detection, as well as data mining. We show how MapReduce software level fault tolerance can be used to achieve the same availability as scale up data centers. Second, we extend existing job schedulers for deadline-driven jobs to handle machine and software failures and satisfy the service level objectives.
【 预 览 】
附件列表
Files
Size
Format
View
Performance guarantees for deadline-driven MapReduce jobs under failure