期刊论文

【摘要】

The advent of various processing frameworks which happens under big data technologies is due to tremendous dataset size and its complexity. The speed of execution was much higher with High Performance computing frameworks rather than big data processing frameworks. As majority of the jobs under big data are mostly data intensive rather than computation intensive, the High Performance Computing paradigms were not been used in big data processing. This paper reviews two distributed and parallel computing frameworks: Apache Spark and MPI. Sentiment analysis on twitter data is chosen as a test case application for benchmarking and implemented on Scala programming for spark processing and in C++ for MPI. Experiments were conducted on Google cloud virtual machines for three data set sizes, 100 GB, 500 GB and 1 TB to compare the execution times. Results shown that MPI outperforms Apache Spark in parallel and distributed cluster computing environments and hence the higher performance of MPI can be exploited in big data applications for improving speedups.

【授权许可】

CC BY

【预览】

附件列表
Files	Size	Format	View
RO201902194691636ZK.pdf	487KB	PDF	download

Journal of computer sciences
Performance Evaluation of Apache Spark Vs MPI: A Practical Case Study on Twitter Sentiment Analysis

Kumar, Deepa S¹
关键词: Big Data; High Performance Computing; Apache Spark; MPI; Sentiment Analysis; Scala Programming; Cluster Computing;
DOI : 10.3844/jcssp.2017.781.794
学科分类：计算机科学（综合）
来源: Science Publications
PDF


	文献评价指标
	下载次数：6次	浏览次数：17次

【 摘 要 】

【 授权许可】

【 预 览 】

【摘要】

【授权许可】

【预览】