| Applied Sciences | |
| An Efficient Distributed SPARQL Query Processing Scheme Considering Communication Costs in Spark Environments | |
| Jongtae Lim1  Dojin Choi1  Byounghoon Kim1  Hyeonbyeong Lee1  Jaesoo Yoo1  Kyoungsoo Bok2  | |
| [1] Department of Information and Communication Engineering, Chungbuk National University, Chungdae-ro 1, Seowon-Gu, Cheongju 28644, Korea;Department of SW Convergence Technology, Wonkwang University, Iksandae 460, Iksan 54538, Korea; | |
| 关键词: SPARQL; Apache spark; RDF; distributed query processing; communication cost; | |
| DOI : 10.3390/app12010122 | |
| 来源: DOAJ | |
【 摘 要 】
Various distributed processing schemes were studied to efficiently utilize a large scale of RDF graph in semantic web services. This paper proposes a new distributed SPARQL query processing scheme considering communication costs in Spark environments to reduce I/O costs during SPARQL query processing. We divide a SPARQL query into several subqueries using a WHERE clause to process a query of an RDF graph stored in a distributed environment. The proposed scheme reduces data communication costs by grouping the divided subqueries in related nodes through the index and processing them, and the grouped subqueries calculate the cost of all possible query execution paths to select an efficient query execution path. The efficient query execution path is selected through the algorithm considering the data parsing cost of all possible query execution paths, amount of data communication, and queue time per node. It is shown through various performance evaluations that the proposed scheme outperforms the existing schemes.
【 授权许可】
Unknown