学位论文详细信息
Scalable Embeddings for Kernel Clustering on MapReduce
Data Clustering;Kernel Methods;Scalable Data Analytics;MapReduce;Big Data;Electrical and Computer Engineering
Elgohary, Ahmed
University of Waterloo
关键词: Data Clustering;    Kernel Methods;    Scalable Data Analytics;    MapReduce;    Big Data;    Electrical and Computer Engineering;   
Others  :  https://uwspace.uwaterloo.ca/bitstream/10012/8262/1/Elgohary_Ahmed.pdf
瑞士|英语
来源: UWSPACE Waterloo Institutional Repository
PDF
【 摘 要 】

There is an increasing demand from businesses and industries to make the best use of their data. Clustering is a powerful tool for discovering natural groupings in data. The k-means algorithm is the most commonly-used data clustering method, having gained popularity for its effectiveness on various data sets and ease of implementation on different computing architectures. It assumes, however, that data are available in an attribute-value format, and that each data instance can be represented as avector in a feature space where the algorithm can be applied. These assumptions are impractical for real data, and they hinder the use of complex data structures in real-world clustering applications. The kernel k-means is an effective method for data clustering which extends the k-means algorithm to work on a similarity matrix over complex data structures. The kernel k-means algorithm is however computationally very complex as it requires the complete data matrix to be calculated and stored. Further, the kernelized nature of the kernel k-means algorithm hinders the parallelization of its computations on modern infrastructures for distributed computing. This thesis defines a family of kernel-based low-dimensional embeddings that allows for scaling kernel k-means on MapReduce via an efficient and unified parallelization strategy. Then, three practical methods for low-dimensional embedding that adhere to our definition of the embedding family are proposed. Combining the proposed parallelization strategy with any of the three embedding methods constitutes a complete scalable and efficient MapReduce algorithm for kernel k-means. The efficiency and the scalability of the presented algorithms are demonstrated analytically and empirically.

【 预 览 】
附件列表
Files Size Format View
Scalable Embeddings for Kernel Clustering on MapReduce 767KB PDF download
  文献评价指标  
  下载次数:4次 浏览次数:36次