期刊论文详细信息
BMC Systems Biology
Cluster and propensity based approximation of a network
Steve Horvath4  Kenneth Lange2  Peter Langfelder1  John Michael Ranola3 
[1] Human Genetics, UCLA, Los Angeles, CA, USA;Statistics, UCLA, Los Angeles, CA, USA;Biomathematics, University of California, Los Angeles, CA, USA;Biostatistics, UCLA, Los Angeles, CA, USA
关键词: Network conformity;    Propensity;    MM algorithm;    Model-based clustering;    Network decomposition;   
Others  :  1143033
DOI  :  10.1186/1752-0509-7-21
 received in 2012-06-20, accepted in 2013-02-14,  发布年份 2013
PDF
【 摘 要 】

Background

The models in this article generalize current models for both correlation networks and multigraph networks. Correlation networks are widely applied in genomics research. In contrast to general networks, it is straightforward to test the statistical significance of an edge in a correlation network. It is also easy to decompose the underlying correlation matrix and generate informative network statistics such as the module eigenvector. However, correlation networks only capture the connections between numeric variables. An open question is whether one can find suitable decompositions of the similarity measures employed in constructing general networks. Multigraph networks are attractive because they support likelihood based inference. Unfortunately, it is unclear how to adjust current statistical methods to detect the clusters inherent in many data sets.

Results

Here we present an intuitive and parsimonious parametrization of a general similarity measure such as a network adjacency matrix. The cluster and propensity based approximation (CPBA) of a network not only generalizes correlation network methods but also multigraph methods. In particular, it gives rise to a novel and more realistic multigraph model that accounts for clustering and provides likelihood based tests for assessing the significance of an edge after controlling for clustering. We present a novel Majorization-Minimization (MM) algorithm for estimating the parameters of the CPBA. To illustrate the practical utility of the CPBA of a network, we apply it to gene expression data and to a bi-partite network model for diseases and disease genes from the Online Mendelian Inheritance in Man (OMIM).

Conclusions

The CPBA of a network is theoretically appealing since a) it generalizes correlation and multigraph network methods, b) it improves likelihood based significance tests for edge counts, c) it directly models higher-order relationships between clusters, and d) it suggests novel clustering algorithms. The CPBA of a network is implemented in Fortran 95 and bundled in the freely available R package PropClust.

【 授权许可】

   
2013 Ranola et al.; licensee BioMed Central Ltd.

【 预 览 】
附件列表
Files Size Format View
20150328223818340.pdf 3420KB PDF download
Figure 7. 44KB Image download
Figure 6. 65KB Image download
Figure 5. 249KB Image download
Figure 4. 170KB Image download
Figure 3. 159KB Image download
Figure 2. 105KB Image download
Figure 1. 84KB Image download
【 图 表 】

Figure 1.

Figure 2.

Figure 3.

Figure 4.

Figure 5.

Figure 6.

Figure 7.

【 参考文献 】
  • [1]von Luxburg U: A tutorial on spectral clustering. Stat Comput 2007, 17(4):395-416.
  • [2]Eisen M, Spellman P, Brown P, Botstein D: Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci U S A 1998, 95(25):14863-14868.
  • [3]Stuart JM, Segal E, Koller D, Kim SK: A gene-coexpression network for global discovery of conserved genetic modules. Science 2003, 302(5643):249-255.
  • [4]Oldham M, Konopka G, Iwamoto K, Langfelder P, Kato T, Horvath S, Geschwind D: Functional organization of the transcriptome in human brain. Nat Neurosci 2008, 11(11):1271-1282.
  • [5]Zhang B, Horvath S: A general framework for weighted gene co-expression network analysis. Stat Appl Genet Mol Biol 2005, 4:17.
  • [6]Huang Y, Li H, Hu H, Yan X, Waterman M, Huang H, Zhou X: Systematic discovery of functional modules and context-specific functional annotation of human genome. Bioinformatics 2007, 23(13):i222-i229.
  • [7]Horvath S, Zhang B, Carlson M, Lu K, Zhu S, Felciano R, Laurance M, Zhao W, Shu Q, Lee Y, Scheck A, Liau L, Wu H, Geschwind D, Febbo P, Kornblum H, Cloughesy T, Nelson S, Mischel P: Analysis of oncogenic signaling networks in Glioblastoma identifies ASPM as a novel molecular target. Proc Natl Acad Sci USA 2006, 103(46):17402-17407.
  • [8]Carlson M, Zhang B, Fang Z, Mischel P, Horvath S, Nelson SF: Gene connectivity, function, and sequence conservation: predictions from modular yeast co-expression networks. BMC Genomics 2006, 7(7):40.
  • [9]Oldham M, Horvath S, Geschwind D: Conservation and evolution of gene coexpression networks in human and chimpanzee brains. Proc Natl Acad Sci U S A 2006, 103(47):17973-17978.
  • [10]Chen L, EmmertStreib F, Storey J: Harnessing naturally randomized transcription to infer regulatory relationships among genes. Genome Biol 2007., 8(219)
  • [11]Keller M, Choi Y, Wang P, Belt Davis D, Rabaglia M, Oler A, Stapleton D, Argmann C, Schueler K, Edwards S, Steinberg H, Chaibub Neto E, Kleinhanz R, Turner S, Hellerstein MK, Schadt E, Yandell B, Kendziorski C, Attie A: A gene expression network model of type 2 diabetes links cell cycle regulation in islets with diabetes susceptibility. Genome Res 2008, 18(5):706-716.
  • [12]Dawson J, Ye S, Kendziorski C: An empirical bayesian framework for discovering differential co-expression. Bioinformatics 2012, 68(2):455-465.
  • [13]de la Fuente A: From ‘differential expression’ to ‘differential networking’ -identification of dysfunctional regulatory networks in diseases. Trends Genet 2010, 26(7):326-333.
  • [14]Horvath S, Zhang B, Carlson M, Lu K, Zhu S, Felciano R, Laurance M, Zhao W, Shu Q, Lee Y, Scheck A, Liau L, Wu H, Geschwind D, Febbo P, Kornblum H, TF C, Nelson S, Mischel P: Analysis of oncogenic signaling networks in glioblastoma identifies ASPM as a novel molecular target. Proc Natl Acad Sci U S A 2006, 103(46):17402-17407.
  • [15]Ghazalpour A, Doss S, Zhang B, Plaisier C, Wang S, Schadt E, Thomas A, Drake T, Lusis A, Horvath S: Integrating genetics and network analysis to characterize genes related to mouse weight. PloS Genet 2006., 2(8)
  • [16]Fuller T, Ghazalpour A, Aten J, Drake T, Lusis A, Horvath S: Weighted gene coexpression network analysis strategies applied to mouse weight. Mamm Genome 2007, 18(6-7):463-472.
  • [17]Gargalovic PS, Gharavi NM, Clark MJ, Pagnon J, Yang WP, He A, Truong A, Baruch-Oren T, Berliner JA, Kirchgessner TG, Lusis A J: The unfolded protein response is an important regulator of inflammatory genes in Endothelial cells. Arterioscler Thromb Vasc Biol 2006, 26(11):2490-2496. [http://atvb.ahajournals.org/cgi/content/abstract/26/11/2490 webcite]
  • [18]Horvath S, Dong J: Geometric interpretation of gene co-expression network analysis. PloS Comput Biol 2008, 4:8.
  • [19]Dong J, Horvath S: Understanding network concepts in modules. BMC Syst Biol 2007, 1:24. BioMed Central Full Text
  • [20]Ranola J, Ahn S, Sehl ME, Smith D, Lange K: A Poisson model for random multigraphs. Bioinformatics 2010, 26(16):2004-2011.
  • [21]Horvath S: Weighted Network Analysis: Applications in Genomics and Systems Biology. 1 edition. New York: Springer; 2011.
  • [22]Deeds E, Ashenberg O, Shakhnovich E: A simple physical model for scaling in protein-protein interaction networks. Proc Natl Acad Sci U S A 2006, 103(2):311-316.
  • [23]Langfelder P, Horvath S: Eigengene networks for studying the relationships between co-expression modules. BMC Syst Biol 2007, 1:54. BioMed Central Full Text
  • [24]Langfelder P, Zhang B, Horvath S: Defining clusters from a hierarchical cluster tree: the dynamic tree cut library for R. Bioinformatics 2007, 24(5):719-20.
  • [25]Langfelder P, Horvath S: WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics 2008, 9:559. BioMed Central Full Text
  • [26]McKusick-Nathans Institute of Genetic Medicine JHU: Online mendelian inheritance in man, OMIM®. [http://omim.org/ webcite]
  • [27]Goh KI, Cusick ME, Valle D, Childs B, Vidal M, Barabási AL: The human disease network. Proc Natl Acad Sci 2007, 104(21):8685-8690. [http://www.pnas.org/content/104/21/8685.abstract webcite]
  • [28]Rogers F: Medical subject headings. Bull Med Libr Assoc 1963, 51:114-116.
  • [29]Steinberg S, Dodt G, Raymond G, Braveman N, Moser A, Moser H: Peroxisome biogenesis disorders. Biochemica et Biophysica Acta - Mol Cell Res 2006, 1763(12):1733.
  • [30]Maere S, Heymans K, Kuiper M: BiNGO: a Cytoscape plugin to assess overrepresentation of gene ontology categories in biological networks. Bioinformatics 2005, 21(16):3448-3449. [http://bioinformatics.oxfordjournals.org/content/21/16/3448.abstract webcite]
  • [31]Shaanan B: Structure of human Oxyhaemoglobin at 2.1 a resolution. J Mol Biol 1983, 171:31-59.
  • [32]Stelzl Uea: A human protein-protein interaction network: a resource for Annotating the Proteome. Cell 2005, 122(6):957-968.
  • [33]U.S. National Institute of Health’s informational page on Usher syndrome [http://www.nidcd.nih.gov/health/hearing/pages/usher.aspx webcite]
  • [34]U.S. National Institute of Health’s informational page on Waardenburg syndrome [http://www.ncbi.nlm.nih.gov/pubmedhealth/PMH0002401/ webcite]
  • [35]U.S. National Institute of Health’s informational page on Craniofacial-deafness-hand syndrome [http://ghr.nlm.nih.gov/condition/craniofacial-deafness-hand-syndrome webcite]
  • [36]Alfmova M, Lezhelko T, Golimbet V, Korovaltseva G, Lavrushkina O, Kolesina N, Frolova L, Muratova A, Abramova L, Kaleda V: Investigation of association of the brain-derived neurotrophic factor (BDNF) and a serotonin receptor 2A (5-HTR2A) genes with voluntary and involuntary attention in schizophrenia. Zh Nevrol Psikhiatr Im S S Korsakova 2008, 108(4):62-69.
  • [37]Freebase open data resource [http://www.freebase.com/ webcite]
  • [38]DuPont Wikipedia entry [http://en.wikipedia.org/wiki/DuPont webcite]
  • [39]Erdös P, Renyi A: On random graphs. Publicationes Mathematicae 1959, 6:290-297.
  • [40]Watts D, Strogatz S: Collective dynamics of ‘small-world’ networks. Nature 1998, 393(6684):440-442.
  • [41]Barabasi A, Albert R: Emergence of scaling in random networks. Science 1999, 286(5439):509-512.
  • [42]Albert R, Barabasi A: Statistical mechanics of complex networks. Rev Mod Phys 2002, 74:47-97.
  • [43]Newman MEJ, Strogatz SH, Watts DJ: Random graphs with arbitrary degree distributions and their applications. Phys Rev E 2001, 64:026118. [http://link.aps.org/doi/10.1103/PhysRevE.64.026118 webcite]
  • [44]Krapivsky PL, Redner S, Leyvraz F: Connectivity of growing random networks. Phys Rev Lett 2000, 85:4629-4632. [http://link.aps.org/doi/10.1103/PhysRevLett.85.4629 webcite]
  • [45]Lange K: Numerical Analysis for Statisticians. New York: Springer; 2010.
  • [46]Strogatz SH: Exploring complex networks. Nature 2001, 410(6825):268-276. [http://dx.doi.org/10.1038/35065725 webcite]
  • [47]Durrett R: Random Graph Dynamics. New York: Cambridge University Press; 2006.
  • [48]Kaufman L, Rousseeuw P: Finding Groups in Data: An Introduction to Cluster Analysis. New York: John Wiley and Sons, Inc; 1990.
  • [49]Zhou D, Burges CJC: Spectral clustering and transductive learning with multiple views. In Proceedings of the 24th international conference on Machine learning, ICML ’07. New York: ACM; 2007:1159-1166. [http://doi.acm.org/10.1145/1273496.1273642 webcite]
  • [50]Newman MEJ, Leicht EA: Mixture models and exploratory analysis in networks. Proc Natl Acad Sci U S A 2007, 104(23):9564-9569.
  • [51]Sinkkonen J, Aukia J, Kaski S: Component models for large networks. arXiv e-prints 2008, arXiv:0803.1628.
  • [52]Hofman JM, Wiggins CH: Bayesian approach to network modularity. Phys Rev Lett 2008, 100(25):258701.
  • [53]Kemp C, Tenenbaum JB, Griffiths TL, Yamada T, Ueda N: Learning systems of concepts with an infinite relational model. In AAAI. United States: AAAI Press; 2006:381-388.
  • [54]Airoldi EM, Blei DM, Fienberg SE, Xing EP: Mixed membership stochastic blockmodels. J Mach Learn Res 2008, 9:1981-2014.
  • [55]Newman M: Modularity and community structure in networks. PNAS 2006, 103:8577-8582.
  • [56]Schaeffer SE: Graph clustering. Comput Sci Rev 2007, 1:27-64. [http://www.sciencedirect.com/science/article/pii/S1574013707000020 webcite]
  • [57]Yin J, Li H: A sparse conditional gaussian graphical model for analysis of genetical genomics data. Ann Appl Stat 2011, 5(4):2630-2650.
  • [58]Xulvi-Brunet R, Li H: Co-expression networks: graph properties and topological comparisons. Bioinformatics 2010, 26(2):205-214.
  • [59]Hunter D, Lange K: A tutorial on MM algorithms. Am Stat 2004, 58:30-37.
  • [60]Lange K: Optimization. New York: Springer; 2004.
  • [61]Wu T, Lange K: The MM alternative to EM. Stat Sci 2010, 25(4):492-505.
  • [62]Zhou H, Alexander D, Lange K: A quasi-Newton acceleration for high-dimensional optimization algorithms. Stat Comput 2011, 21:261-273. [http://dx.doi.org/10.1007/s11222-009-9166-3 webcite] http://dx.doi.org/10.1007/s11222-009-9166-3 webcite
  • [63]Lange K: Numerical Analysis for Statisticians. New York: Springer-Verlag; 1999.
  • [64]Akaike H: A new look at the statistical model identification. Automatic Control, IEEE Trans 1974, 19(6):716-723.
  • [65]Schwarz G: Estimating the dimension of a model. Ann Stat 1978, 6(2):461-464. [http://www.jstor.org/stable/2958889 webcite]
  • [66]Watanabe S: Algebraic Geometry and Statistical Learning Theory. Cambridge: Cambridge University Press; 2009.
  文献评价指标  
  下载次数:37次 浏览次数:29次