学位论文详细信息
Scaling Complex Analytical Processing on Graph Structured Data Using Map Reduce
Map-Reduce;Pig Latin;OLAP;Complex Analytical Processsing
Sridhar, Radhika ; Dr. Kemafor Anyanwu, Committee Chair,Dr. Xiaosong Ma, Committee Member,Dr. Tao Xie, Committee Member,Sridhar, Radhika ; Dr. Kemafor Anyanwu ; Committee Chair ; Dr. Xiaosong Ma ; Committee Member ; Dr. Tao Xie ; Committee Member
University:North Carolina State University
关键词: Map-Reduce;    Pig Latin;    OLAP;    Complex Analytical Processsing;   
Others  :  https://repository.lib.ncsu.edu/bitstream/handle/1840.16/2171/etd.pdf?sequence=1&isAllowed=y
美国|英语
来源: null
PDF
【 摘 要 】

Efficient analytical processing at the Web scale has become an important requirement as more decision support applications rely on the data on the Web. One approach for achieving the significant scalability is by the use of parallel processing techniques on a computational cluster of the commodity grade machines. Software platforms such as Map-Reduce, Hadoop and Pig are now available that allow the users to encode their tasks in terms of simple low-level primitives that are easily parallelizable. Further, a high-level dataflow language called Pig Latin has been proposed for specifying analytical processing tasks using a mixture of the procedural and the declarative paradigms. This approach strikes a good balance between customizability and the potential for an automatic query optimization. However, the analytical processing capability currently offered by these frameworks is fairly basic and as such has narrow applicability to many real world scenarios. Furthermore, an increasing amount of data being made available on the Web is semi-structured.For example, some search engines report that the recent W3C standard for representing the metadata on the Web called the Resource Description Framework (RDF) already accounts for about 8,502,794 Web data URL’s and 2,759,040 documents. However, such data is typically organized as a set of binary relations (a graph) whereas these frameworks are primarily targeted at processing the data structured as n-ary relational tables. This thesis addresses the problem of enabling scalable analytical data processing on RDF datasets. Its approach is based on extending Yahoo’s Pig system (an open source parallel processing) with constructs that allow complex data processing problems on the graph structured data to be expressed in a manner that is more amenable to automatic parallelization. Specifically, it makes the following contributions:1.Extends Pig Latin, the dataflow language for Pig, with primitives that support the expression of queries in terms of a readily parallelizable multidimensional join operator, as well as support the expression of graph navigational filter expressions.2.Implements the introduced primitives in a Hadoop implementationrunning on VCL3.Develops a cost model for estimating the cost of queries expressed in terms of the multidimensional join operator.

【 预 览 】
附件列表
Files Size Format View
Scaling Complex Analytical Processing on Graph Structured Data Using Map Reduce 2370KB PDF download
  文献评价指标  
  下载次数:7次 浏览次数:26次