学位论文详细信息
Statistical inference on network data
Network Inference;Random Graphs;Sequential Importance Sampling;Network Robustness;Exponential Random Graph Model;Dense Subgraph Discovery;Simulated Annealing;Community Detection;Degree Corrected Stochastic Block Model
Zhang, Jingfei
关键词: Network Inference;    Random Graphs;    Sequential Importance Sampling;    Network Robustness;    Exponential Random Graph Model;    Dense Subgraph Discovery;    Simulated Annealing;    Community Detection;    Degree Corrected Stochastic Block Model;   
Others  :  https://www.ideals.illinois.edu/bitstream/handle/2142/49840/Jingfei_Zhang.pdf?sequence=1&isAllowed=y
美国|英语
来源: The Illinois Digital Environment for Access to Learning and Scholarship
PDF
【 摘 要 】

Networks arise from modeling complex systems in various fields, such as computer science, social science, biology, psychology and finance. Understanding and analyzing networks help us better understand these complex systems and extract useful information. In this dissertation, we study problems on network sampling, network modeling and data mining on networks.Random graphs with given vertex degrees have been widely used as a model for many real-world complex networks. However, both statistical inference and analytic study of such networks present great challenges. In Chapter 2, we propose new sequential importance sampling methods for sampling networks with a given degree sequence. These samples can be used to approximate closely the null distributions of a number of test statistics involved in such networks, and provide an accurate estimate of the total number of networks with given vertex degrees. We study the asymptotic behavior of the proposed algorithm and prove that the importance weight remains bounded as the size of the graph grows. This property guarantees that the proposed sampling algorithm can still work efficiently even for large sparse graphs. We apply our method to a range of examples to demonstrate its efficiency in real problems.One important question for complex networks is how the network's connectivity will be affected if the network is under targeted attacks, i.e., the nodes with the most links are attacked. In Chapter 3, we found that a dolphin network is resilient to targeted attacks. To further study the resilient property, we fit an exponential random graph model to the dolphin network. The fitted model characterizes network resiliency and identifies local structures that can reproduce the global resilience property. Such a statistical model can be used to build the Internet and other networks to increase the attack tolerance of those networks.The problem of finding densely connected subgraphs in a network has attracted a lot of recent attention. Such subgraphs are sometimes referred to as communities in social networks or molecular modules in protein networks. In Chapter 4, we proposetwo Monte Carlo optimization algorithms for identifying the densest subgraphs with a fixed size or with size in a given range. The new algorithms combine the idea ofsimulated annealing and efficient moves for the Markov chain, and both algorithms are shown to converge to the set of optimal states (densest subgraphs) with probability one. When applied to a yeast protein interaction network and a stock market graph, the algorithms identify interesting new densely connected subgraphs. One of the most relevant features of networks representing real systems is the community structure. Detecting communities is of great importance in understanding, analyzing, organizing networks. In Chapter 5, we describe a statistical framework for modularity-based network community detection. We derive the modularity function under the proposed statistical framework, and propose a fast modularity maximization algorithm based on the eigen-spectrum of the modularity matrix. A hypothesis testing procedure is developed to determine the significance of an identified community structure. The modularity formulated under the proposed statistical framework is shown to be consistent under the degree-corrected stochastic block model framework. Several synthetic networks and real world networks are used to demonstrate the effectiveness of our method.

【 预 览 】
附件列表
Files Size Format View
Statistical inference on network data 5792KB PDF download
  文献评价指标  
  下载次数:20次 浏览次数:32次