Efficient management of large multidimensional datasets has attracted much attentionin the database research community. Such large multidimensional datasets are commonand efficient algorithms are needed for analyzing these data sets for a variety of applications.In this thesis, we focus our study on two very common classes of analysis: similarityand skyline summarization. We first focus on similarity when one of the dimensions in themultidimensional dataset is temporal. We then develop algorithms for evaluating skylinesummaries effectively for both temporal and low-cardinality attribute domain datasets andpropose different methods for improving the effectiveness of the skyline summary operation.This thesis begins by studying similarity measures for time-series datasets and efficientalgorithms for time-series similarity evaluation. The first contribution of this thesis isa new algorithm which can beused to evaluate similarity methods whose matching criteria is bounded by a specified threshold value. The second contribution of this thesis is the development of a new time-interval skylineoperator, which continuously computes the current skyline over a data stream. We presenta new algorithm called LookOut for evaluating such queries efficiently, and empiricallydemonstrate the scalability of this algorithm. Current skyline evaluation techniques follow a common paradigm that eliminates dataelements from skyline consideration by finding other elements in the dataset that dominatethem. The performance of such techniques is heavily influenced by the underlying datadistribution. The third contribution of this thesis is a novel technique called the LatticeSkyline Algorithm (LS) that is built around a new paradigm for skyline evaluation ondatasets with attributes that are drawn from low-cardinality domains. The utility of the skyline as a data summarization technique is often diminished by thevolume of points in the skyline The final contribution of this thesis is a novel schemewhich remedies the skyline volume problem byranking the elements of the skyline based on their importance to the skyline summary. Collectively, the techniques described in this thesis present efficient methods for twocommon and computationally intensive analysis operations on large multidimensionaldatasets.
【 预 览 】
附件列表
Files
Size
Format
View
Efficient Algorithms for Similarity and Skyline Summary on Multidimensional Datasets.