学位论文详细信息
Modern aspects of unsupervised learning
Unsupervised learning;Clustering;Perturbation resilience;Distributed clustering;Community detection
Liang, Yingyu ; Balcan, Maria-Florina Computer Science Blum, Avrim Fortnow, Lance Isbell, Charles L. Randall, Dana Song, Le ; Balcan, Maria-Florina
University:Georgia Institute of Technology
Department:Computer Science
关键词: Unsupervised learning;    Clustering;    Perturbation resilience;    Distributed clustering;    Community detection;   
Others  :  https://smartech.gatech.edu/bitstream/1853/52282/1/LIANG-DISSERTATION-2014.pdf
美国|英语
来源: SMARTech Repository
PDF
【 摘 要 】

Unsupervised learning has become more and more important due to the recent explosion of data. Clustering, a key topic in unsupervised learning, is a well-studied task arising in many applications ranging from computer vision to computational biology to the social sciences. This thesis is a collection of work exploring two modern aspects of clustering: stability and scalability.In the first part, we study clustering under a stability property called perturbation resilience. As an alternative approach to worst case analysis, this novel theoretical framework aims at understanding the complexity of clustering instances that satisfy natural stability assumptions. In particular, we show how to correctly cluster instances whose optimal solutions are resilient to small multiplicative perturbations on the distances between data points, significantly improving existing guarantees. We further propose a generalized property that allows small changes in the optimal solutions after perturbations, and provide the first known positive results in this more challenging setting.In the second part, we study the problem of clustering large scale data distributed across nodes which communicate over the edges of a connected graph. We provide algorithms with small communication cost and provable guarantees on the clustering quality. We also propose algorithms for distributed principal component analysis, which can be used to reduce the communication cost of clustering high dimensional data while merelycomprising the clustering quality.In the third part, we study community detection, the modern extension of clustering to network data. We propose a theoretical model of communities that are stable in the presence of noisy nodes in the network, and design an algorithm that provably detects all such communities. We also provide a local algorithm for large scale networks, whose running time depends on the sizes of the output communities but not that of the entire network.

【 预 览 】
附件列表
Files Size Format View
Modern aspects of unsupervised learning 943KB PDF download
  文献评价指标  
  下载次数:23次 浏览次数:14次