学位论文

【摘要】

Overwhelming amount of data is being generated by various applications and devices in real-time. While advanced analysis of large dataset is in high demand, data sizes have surpassed capabilities of conventional software and hardware. Data-intensive analytics should be processed in tolerable elapsed time using commodity hardware. Hadoop framework efficiently distributes large datasets over multiple commodity servers and the MapReduce framework performs parallel computations. We discuss the I/O bottlenecks of Hadoop MapReduce framework and propose methods for enhancing I/O performance in common MapReduce jobs. A proven approach is to cache input data to maximize memory-locality of all map tasks. We introduce an approach to optimize I/O in the shuffle phase, the in-node combining design which extend the scope of the traditional combiner to a node level. The in-node combiner reduces the total number of emitted intermediate results and curtail network traffic between mappers and reducers.

【预览】

附件列表
Files	Size	Format	View
Hadoop MapReduce Performance Enhancement Using In-Node Combiners	737KB	PDF	download


Hadoop MapReduce Performance Enhancement Using In-Node Combiners
MapReduce;Hadoop;HDFS;Combiner;NoSQL;621
공과대학 전기·컴퓨터공학부 ;
University:서울대학교 대학원
关键词: MapReduce; Hadoop; HDFS; Combiner; NoSQL; 621;
Others : http://s-space.snu.ac.kr/bitstream/10371/123168/1/000000026798.pdf
美国\|英语
来源: Seoul National University Open Repository
PDF


	文献评价指标
	下载次数：10次	浏览次数：6次

【 摘 要 】

【 预 览 】

【摘要】

【预览】