BMC Bioinformatics | |
Resolving the structure of interactomes with hierarchical agglomerative clustering | |
Research | |
Joel S Bader1  Yongjin Park1  | |
[1] Department of Biomedical Engineering, Johns Hopkins University, 21218, Baltimore, MD, USA;High-Throughput Biology Center, Johns Hopkins University School of Medicine, 21218, Baltimore, MD, USA; | |
关键词: Markov Chain Monte Carlo; Genetic Interaction; Link Prediction; Hierarchical Agglomerative Cluster; Synthetic Lethality; | |
DOI : 10.1186/1471-2105-12-S1-S44 | |
来源: Springer | |
【 摘 要 】
BackgroundGraphs provide a natural framework for visualizing and analyzing networks of many types, including biological networks. Network clustering is a valuable approach for summarizing the structure in large networks, for predicting unobserved interactions, and for predicting functional annotations. Many current clustering algorithms suffer from a common set of limitations: poor resolution of top-level clusters; over-splitting of bottom-level clusters; requirements to pre-define the number of clusters prior to analysis; and an inability to jointly cluster over multiple interaction types.ResultsA new algorithm, Hierarchical Agglomerative Clustering (HAC), is developed for fast clustering of heterogeneous interaction networks. This algorithm uses maximum likelihood to drive the inference of a hierarchical stochastic block model for network structure. Bayesian model selection provides a principled method for collapsing the fine-structure within the smallest groups, and for identifying the top-level groups within a network. Model scores are additive over independent interaction types, providing a direct route for simultaneous analysis of multiple interaction types. In addition to inferring network structure, this algorithm generates link predictions that with cross-validation provide a quantitative assessment of performance for real-world examples.ConclusionsWhen applied to genome-scale data sets representing several organisms and interaction types, HAC provides the overall best performance in link prediction when compared with other clustering methods and with model-free graph diffusion kernels. Investigation of performance on genome-scale yeast protein interactions reveals roughly 100 top-level clusters, with a long-tailed distribution of cluster sizes. These are in turn partitioned into 1000 fine-level clusters containing 5 proteins on average, again with a long-tailed size distribution. Top-level clusters correspond to broad biological processes, whereas fine-level clusters correspond to discrete complexes. Surprisingly, link prediction based on joint clustering of physical and genetic interactions performs worse than predictions based on individual data sets, suggesting a lack of synergy in current high-throughput data.
【 授权许可】
CC BY
© Park and Bader; licensee BioMed Central Ltd. 2011
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
RO202311093980979ZK.pdf | 677KB | download |
【 参考文献 】
- [1]
- [2]
- [3]
- [4]
- [5]
- [6]
- [7]
- [8]
- [9]
- [10]
- [11]
- [12]
- [13]
- [14]
- [15]
- [16]
- [17]
- [18]
- [19]
- [20]
- [21]
- [22]
- [23]
- [24]
- [25]
- [26]
- [27]
- [28]
- [29]
- [30]
- [31]
- [32]
- [33]
- [34]
- [35]
- [36]