学位论文详细信息
Statistical Inference for Some Problems in Network Analysis.
Network Analysis;Community Identification;Link Prediction;Statistics and Numeric Data;Science;Statistics
Zhao, YunpengNguyen, Long ;
University of Michigan
关键词: Network Analysis;    Community Identification;    Link Prediction;    Statistics and Numeric Data;    Science;    Statistics;   
Others  :  https://deepblue.lib.umich.edu/bitstream/handle/2027.42/94042/yunpeng_1.pdf?sequence=1&isAllowed=y
瑞士|英语
来源: The Illinois Digital Environment for Access to Learning and Scholarship
PDF
【 摘 要 】

Recent advances in computing and measurement technologies have led to an explosion in the amount of data that are being collected in all areas of application.Much of these data have network or graph structures, and they are common in diverse scientific areas, such as biology, computer science, sociology and so on. This dissertation makes three contributions to inference problems in statistical network analysis.The first two parts of the dissertation focus on community analysis. In the first part, we establish general theory for checking consistency of community detection under the degree-corrected block model, a generalization of standard stochastic block model, which allows variation in node degrees within a community, thus accommodating hub nodes. We compare several community detection criteria under both the standard and the degree-corrected block models. We show which criteria are consistent under which models and constraints, as well as compare their relative performance in practice.The second part proposes a new framework for community identification. Most community detection methods focus on partitioning the entire network into communities, with the expectation of many ties within communities and few ties between. However, many networks contain nodes that do not fit in with any of the communities, and forcing every node into a community can distort results. We propose a framework that extracts one community at a time, allowing for weakly connected nodes. The proposed extraction criterion performs well on simulated and real networks. For the case of the block model, we establish asymptotic consistency of estimated node labels and propose a hypothesis test for determining the number of communities.The third part makes a contribution to the link prediction problem.In many applications, notably in genetics, a partially observed network may not contain any negative examples of absent edges, which creates a difficulty for many existing supervised learning approaches.We develop a new method which treats the observed network as a sample of the true network with different sampling rates for positive and negative examples. We obtain a relative ranking of potential links by their probabilities, utilizing information on node covariates as well as on network topology.

【 预 览 】
附件列表
Files Size Format View
Statistical Inference for Some Problems in Network Analysis. 2566KB PDF download
  文献评价指标  
  下载次数:14次 浏览次数:27次