学位论文详细信息
Clustering Dependencies over Relational Tables
Integrity Constraints;Database;Data Visualization;Query Optimization
Gao, Yuchen
University of Waterloo
关键词: Integrity Constraints;    Database;    Data Visualization;    Query Optimization;   
Others  :  https://uwspace.uwaterloo.ca/bitstream/10012/10219/1/GAO_YUCHEN.pdf
瑞士|英语
来源: UWSPACE Waterloo Institutional Repository
PDF
【 摘 要 】
Integrity constraints have proven to be valuable in the database field. Not only can they help schema design (functional dependencies, FDs [1][2]), they can also be used in query optimization (ordering dependencies, ODs [4][5][8][9]), or data cleaning (conditional functional dependencies, CFDs [12] and denial constraints, DCs [14]). In this thesis, however, we will introduce a new type of integrity constraint, called a clustering dependency (CD).Similar to ordering dependencies which rely on the database operation ORDER BY, clustering dependencies focus on studying the operation GROUP BY. Furthermore, we claim that clustering dependencies are useful not only in query optimization as most integrity constraints do, but also useful in data visualization, data analysis and MapReduce.In this thesis, we first introduce some examples of clustering dependencies in a real-life dataset. We then formally define clustering dependencies and elaborate on our motivation. We will also look into the reasoning system for clustering dependencies including the implication problem, consistency problem and influence rules for clustering dependencies. After that, we will propose two algorithms for clustering dependencies, first a checking algorithm that is able to check if a given dependency is valid in a table within O(N*M) time, with N being the number of rows and M being the size of potentially aggregated attributes, a.k.a, the size of the right-hand-side attributes. Secondly, we propose a mining algorithm that is able to discover all potential clustering dependencies occurring in a table. Finally, we will use both synthetic and real-life data to test the performance of our mining algorithm.
【 预 览 】
附件列表
Files Size Format View
Clustering Dependencies over Relational Tables 1327KB PDF download
  文献评价指标  
  下载次数:21次 浏览次数:39次