期刊论文详细信息
Statistical Analysis and Data Mining
A framework for stability-based module detection in correlation graphs
article
Mingmei Tian1  Rachael Hageman Blair1  Lina Mu2  Matthew Bonner2  Richard Browne3  Han Yu4 
[1] Department of Biostatistics, State University of New York at Buffalo;Department of Epidemiology and Environmental Health, State University of New York at Buffalo;Department of Biotechnical and Clinical Laboratory Sciences, State University of New York at Buffalo;Department of Biostatistics and Bioinformatics, Roswell Park Comprehensive Cancer Center
关键词: clustering;    graphical model;    Jaccard coefficient;    module detection;    network;    stability;   
DOI  :  10.1002/sam.11495
学科分类:社会科学、人文和艺术(综合)
来源: John Wiley & Sons, Inc.
PDF
【 摘 要 】

Graphs can be used to represent the direct and indirect relationships between variables, and elucidate complex relationships and interdependencies. Detecting structure within a graph is a challenging problem. This problem is studied over a range of fields and is sometimes termed community detection, module detection, or graph partitioning. A popular class of algorithms for module detection relies on optimizing a function of modularity to identify the structure. In practice, graphs are often learned from the data, and thus prone to uncertainty. In these settings, the uncertainty of the network structure can become exaggerated by giving unreliable estimates of the module structure. In this work, we begin to address this challenge through the use of a nonparametric bootstrap approach to assessing the stability of module detection in a graph. Estimates of stability are presented at the level of the individual node, the inferred modules, and as an overall measure of performance for module detection in a given graph. Furthermore, bootstrap stability estimates are derived for complexity parameter selection that ultimately defines a graph from data in a way that optimizes stability. This approach is utilized in connection with correlation graphs but is generalizable to other graphs that are defined through the use of dissimilarity measures. We demonstrate our approach using a broad range of simulations and on a metabolomics dataset from the Beijing Olympics Air Pollution study. These approaches are implemented using bootcluster package that is available in the R programming language.

【 授权许可】

Unknown   

【 预 览 】
附件列表
Files Size Format View
RO202302050004619ZK.pdf 1602KB PDF download
  文献评价指标  
  下载次数:12次 浏览次数:1次