Estimating Community Structure in Networks by Spectral Methods.
Network analysis;Community detection;Concentration of sparse random graphs;Computer Science;Mathematics;Science (General);Statistics and Numeric Data;Engineering;Science;Statistics
Networks are studied in a wide range of fields, including social psychology, sociology, physics, computer science, probability, and statistics. One of the fundamental problems in network analysis is the problem of estimating community structure. Most of existing methods rely on maximizing a criterion over the discrete set of community-label vectors. They require a good initial estimate of communities, which is often found by spectral clustering. Several problems arise from this approach: it has been empirically observed and theoretically predicted that spectral clustering fails when the network is sparse; solving an optimization problem over a discrete set of label vectors is a challenging task; and the number of communities is often unknown in practice. This dissertation contributes to progress on each of these problems.We study random graphs with possibly different edge probabilities in the challenging sparse regime of bounded expected degrees. Unlike in the dense case, neither the network adjacency matrix nor its Laplacian concentrate around their expectations due to the highly irregular distribution of nodedegrees. We prove that simple regularization procedures force the adjacency matrix and the Laplacian to concentrate. As an immediate consequence, we establish the validity of regularized spectral clustering for estimating community structure.We propose a general approach for maximizing a function of a network adjacency matrix over discrete labels by projecting the set of labels onto a subspace approximating the leading eigenvectors of the expected adjacency matrix. This projection onto a low-dimensional space makes the feasible set of labels much smaller and the optimization problem much easier. We prove a general result about this method and show how to apply it to several previously proposed community estimation criteria, establishing its consistency for label estimation in each case.We propose a simple spectral method for estimating the number of communities. We show that the method performs well under several models and a wide range of parameters, and is guaranteed to be consistent under several asymptotic regimes. We compare the new method to several existing methods for estimating the number of communities and show that it is both more accurate and morecomputationally efficient.
【 预 览 】
附件列表
Files
Size
Format
View
Estimating Community Structure in Networks by Spectral Methods.