学位论文详细信息
Sparse Methods for Learning Multiple Subspaces from Large-scale, Corrupted and Imbalanced Data
Subspace learning;Subspace clustering;Sparse methods;Large-scale data;Corrupted data;Imbalanced data;Electrical Engineering
You, Chong
Johns Hopkins University
关键词: Subspace learning;    Subspace clustering;    Sparse methods;    Large-scale data;    Corrupted data;    Imbalanced data;    Electrical Engineering;   
Others  :  https://jscholarship.library.jhu.edu/bitstream/handle/1774.2/60134/YOU-DISSERTATION-2018.pdf?sequence=1&isAllowed=y
瑞士|英语
来源: JOHNS HOPKINS DSpace Repository
PDF
【 摘 要 】

In many practical applications in machine learning, computer vision, data mining and information retrieval one is confronted with datasets whose intrinsic dimension is much smaller than the dimension of the ambient space. This has given rise to the challenge of effectively learning multiple low-dimensional subspaces from such data. Multi-subspace learning methods based on sparse representation, such as sparse representation based classification (SRC) and sparse subspace clustering (SSC) have become very popular due to their conceptual simplicity and empirical success. However, there have been very limited theoretical explanations for the correctness of such approaches in the literature. Moreover, the applicability of existing algorithms to real world datasets is limited due to their high computational and memory complexity, sensitivity to data corruptions as well as sensitivity to imbalanced data distributions.This thesis attempts to advance our theoretical understanding of sparse representation based multi-subspace learning methods, as well as develop new algorithms for handling large-scale, corrupted and imbalanced data. The first contribution of this thesis is a theoretical analysis of the correctness of such methods. In our geometric and randomized analysis, we answer important theoretical questions such as the effect of subspace arrangement, data distribution, subspace dimension, data sampling density, and so on.The second contribution of this thesis is the development of practical subspace clustering algorithms that are able to deal with large-scale, corrupted and imbalanced datasets. To deal with large-scale data, we study different approaches based on active support and divide-and-conquer ideas, and show that these approaches offer a good tradeoff between high accuracy and low running time. To deal with corrupted data, we construct a Markov chain whose stationary distribution can be used to separate between inliers and outliers. Finally, we propose an efficient exemplar selection and subspace clustering method that outperforms traditional methods on imbalanced data.

【 预 览 】
附件列表
Files Size Format View
Sparse Methods for Learning Multiple Subspaces from Large-scale, Corrupted and Imbalanced Data 4072KB PDF download
  文献评价指标  
  下载次数:6次 浏览次数:34次