学位论文详细信息
Fast Data Analytics by Learning
big data analytics systems;database systems;approximate query processing;database learning;Computer Science;Engineering;Computer Science & Engineering
Park, YongjooJagadish, Hosagrahar V ;
University of Michigan
关键词: big data analytics systems;    database systems;    approximate query processing;    database learning;    Computer Science;    Engineering;    Computer Science & Engineering;   
Others  :  https://deepblue.lib.umich.edu/bitstream/handle/2027.42/138598/pyongjoo_1.pdf?sequence=1&isAllowed=y
瑞士|英语
来源: The Illinois Digital Environment for Access to Learning and Scholarship
PDF
【 摘 要 】

Today, we collect a large amount of data, and the volume of the data we collect is projected to grow faster than the growth of the computational power. This rapid growth of data inevitably increases query latencies, and horizontal scaling alone is not sufficient for real-time data analytics of big data. Approximate query processing (AQP) speeds up data analytics at the cost of small quality losses in query answers. AQP produces query answers based on synopses of the original data. The sizes of the synopses are smaller than the original data; thus, AQP requires less computational efforts for producing query answers, thus can produce answers more quickly. In AQP, there is a general tradeoff between query latencies and the quality of query answers; obtaining higher-quality answers requires longer query latencies.In this dissertation, we show we can speed up the approximate query processing without reducing the quality of the query answers by optimizing the synopses using two approaches. The two approaches we employ for optimizing the synopses are as follows:1. Exploiting past computations: We exploit the answers to the past queries. This approach relies on the fact that, if two aggregation involve common or correlated values, the aggregated results must also be correlated. We formally capture this idea using a probabilistic distribution function, which is then used to refine the answers to new queries.2. Building task-aware synopses: By optimizing synopses for a few common types of data analytics, we can produce higher quality answers (or more quickly for certain target quality) to those data analytics tasks. We use this approach for constructing synopses optimized for searching and visualizations.For exploiting past computations and building task-aware synopses, our work incorporates statistical inference and optimization techniques. The contributions in this dissertation resulted in up to 20x speedups for real-world data analytics workloads.

【 预 览 】
附件列表
Files Size Format View
Fast Data Analytics by Learning 1750KB PDF download
  文献评价指标  
  下载次数:6次 浏览次数:15次