| BMC Systems Biology | |
| Identification of direction in gene networks from expression and methylation | |
| Donald Geman2  Martin J Aryee3  Laurent Younes1  David M Simcha4  | |
| [1] Department of Applied Mathematics and Statistics, Johns Hopkins University, Baltimore, MD 21218, USA;Department of Applied Mathematics and Statistics and Institute for Computational Medicine, Johns Hopkins University, Baltimore, MD 21218, USA;Department of Pathology, Massachusetts General Hospital, Charlestown, MA 02129, USA;Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21218, USA | |
| 关键词: Bayesian networks; Microarrays; Methylation; Gene regulation; | |
| Others : 1141821 DOI : 10.1186/1752-0509-7-118 |
|
| received in 2012-09-17, accepted in 2013-10-17, 发布年份 2013 | |
PDF
|
|
【 摘 要 】
Background
Reverse-engineering gene regulatory networks from expression data is difficult, especially without temporal measurements or interventional experiments. In particular, the causal direction of an edge is generally not statistically identifiable, i.e., cannot be inferred as a statistical parameter, even from an unlimited amount of non-time series observational mRNA expression data. Some additional evidence is required and high-throughput methylation data can viewed as a natural multifactorial gene perturbation experiment.
Results
We introduce IDEM (Identifying Direction from Expression and Methylation), a method for identifying the causal direction of edges by combining DNA methylation and mRNA transcription data. We describe the circumstances under which edge directions become identifiable and experiments with both real and synthetic data demonstrate that the accuracy of IDEM for inferring both edge placement and edge direction in gene regulatory networks is significantly improved relative to other methods.
Conclusion
Reverse-engineering directed gene regulatory networks from static observational data becomes feasible by exploiting the context provided by high-throughput DNA methylation data.
An implementation of the algorithm described is available at http://code.google.com/p/idem/ webcite.
【 授权许可】
2013 Simcha et al.; licensee BioMed Central Ltd.
【 预 览 】
| Files | Size | Format | View |
|---|---|---|---|
| 20150327143836645.pdf | 611KB | ||
| Figure 8. | 37KB | Image | |
| Figure 7. | 25KB | Image | |
| Figure 6. | 33KB | Image | |
| Figure 5. | 31KB | Image | |
| Figure 4. | 33KB | Image | |
| Figure 3. | 29KB | Image | |
| Figure 2. | 36KB | Image | |
| Figure 1. | 13KB | Image |
【 图 表 】
Figure 1.
Figure 2.
Figure 3.
Figure 4.
Figure 5.
Figure 6.
Figure 7.
Figure 8.
【 参考文献 】
- [1]Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA, Bloomfield CD, Lander ES: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 1999, 286(5439):531-537. http://www.sciencemag.org/cgi/content/abstract/286/5439/531 webcite
- [2]Geman D, d’Avignon C, Naiman DQ, Winslow RL: Classifying gene expression profiles from pairwise mRNA comparisons. Stat Appl Genet Mol Biol 2004., 3(Article19) http://www.ncbi.nlm.nih.gov/pubmed/16646797 webcite. [PMID: 16646797]
- [3]Price ND, Trent J, El-Naggar AK, Cogdell D, Taylor E, Hunt KK, Pollock RE, Hood L, Shmulevich I, Zhang W: Highly accurate two-gene classifier for differentiating gastrointestinal stromal tumors and leiomyosarcomas. Proc Natl Acad Sci 2007, 104(9):3414-3419. http://www.pnas.org/content/104/9/3414.abstract webcite
- [4]Xu L, Tan A, Winslow R, Geman D: Merging microarray data from separate breast cancer studies provides a robust prognostic test. BMC Bioinformatics 2008, 9:125. http://www.biomedcentral.com/1471-2105/9/125 webcite BioMed Central Full Text
- [5]Dettling M, Bühlmann P: Boosting for tumor classification with gene expression data. Bioinformatics 2003, 19(9):1061-1069. http://bioinformatics.oxfordjournals.org/content/19/9/1061.abstract webcite
- [6]Zhang H, Yu CY, Singer B: Cell and tumor classification using gene expression data: Construction of forests. Proc Natl Acad Sci 2003, 100(7):4168-4172. http://www.pnas.org/content/100/7/4168.abstract webcite
- [7]Tibshirani R, Hastie T, Narasimhan B, Chu G: Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proc Natl Acad Sci 2002, 99(10):6567-6572. http://www.pnas.org/content/99/10/6567.abstract webcite
- [8]Eddy JA, Hood L, Price ND, Geman D: Identifying tightly regulated and variably expressed networks by Differential Rank Conservation (DIRAC). PLoS Comput Biol 2010, 6(5):e1000792. http://dx.doi.org/10.1371 webcite
- [9]Chuang HYY, Lee E, Liu YTT, Lee D, Ideker T: Network-based classification of breast cancer metastasis. Mol Syst Biol 2007,. 3. http://dx.doi.org/10.1038/msb4100180
- [10]Pe’er D, Hacohen N: Principles and strategies for developing network models in cancer. Cell 2011, 144(6):864-873. http://dx.doi.org/10.1016/j.cell.2011.03.001 webcite
- [11]Taniguchi Y, Choi PJ, Li G, Chen H, Babu M, Hearn J, Emili A, Xie XS: Quantifying E. coli proteome and transcriptome with single-molecule sensitivity in single cells. Science (New York, N.Y.) 2010, 329(5991):533-538. http://www.ncbi.nlm.nih.gov/pubmed/20671182 webcite. [PMID: 20671182].
- [12]Scheines R: An introduction to causal inference. Dep Philos 1997,. Paper 430. http://repository.cmu.edu/philosophy/430 webcite
- [13]Spirtes P, Glymour C, Scheines R: Causation, Prediction, and Search, second edition. Cambridge: The MIT Press; 2001. http://www.amazon.com/exec/obidos/redirect?tag=citeulike07-20&path=ASIN/0262194406 webcite
- [14]Pearl J: The causal foundations of structural equation modeling. 2011. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.170.6668&rep=rep1&type=pdf webcite
- [15]Perrin B, Ralaivola L, Mazurie A, Bottani S, Mallet J, d’Alche-Buc F: Gene networks inference using dynamic Bayesian networks. Bioinformatics 2003, 19(suppl_2):ii138-ii148. http://bioinformatics.oxfordjournals.org/cgi/content/abstract/19/suppl_2/ii138 webcite
- [16]Yu J, Smith VA, Wang PP, Hartemink AJ, Jarvis ED: Advances to Bayesian network inference for generating causal networks from observational biological data. Bioinformatics 2004, 20(18):3594-3603. http://bioinformatics.oxfordjournals.org/content/20/18/3594.abstract webcite
- [17]Mukhopadhyay ND, Chatterjee S: Causality and pathway search in microarray time series experiment. Bioinformatics 2007, 23(4):442-449. http://bioinformatics.oxfordjournals.org/cgi/content/abstract/23/4/442 webcite
- [18]Ram R, Chetty M: Comput Biol Bioinformatics, IEEE/ACM Trans. 2011, 8(2):353-367.
- [19]Gardner TS, di Bernardo D, Lorenz D, Collins JJ: Inferring genetic networks and identifying compound mode of action via expression profiling. Science (New York, N.Y.) 2003, 301(5629):102-105. http://www.ncbi.nlm.nih.gov/pubmed/12843395 webcite. [PMID: 12843395].
- [20]Margolin AA, Nemenman I, Basso K, Wiggins C, Stolovitzky G, Favera RD, Califano A: ARACNE: an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context. BMC Bioinformatics 2006, 7(Suppl 1):S7-S7. [PMID: 16723010 PMCID: 1810318] BioMed Central Full Text
- [21]Butte AJ, Kohane IS, Kohane IS: Mutual information relevance networks: functional genomic clustering using pairwise entropy measurements. Pac Symp Biocomput 2000, 5:415-426. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.36.7575 webcite
- [22]Huynh-Thu VA, Irrthum A, Wehenkel L, Geurts P: Inferring regulatory networks from expression data using tree-based methods. PLoS ONE 2010, 5(9):e12776. http://dx.doi.org/10.1371/journal.pone.0012776 webcite
- [23]Friedman N, Linial M, Nachman I: Using Bayesian networks to analyze expression data. J Comput Biol 2000, 7:601-620. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.41.5246 webcite
- [24]Friedman N: Inferring cellular networks using probabilistic graphical models. Science 2004, 303(5659):799-805. http://www.sciencemag.org/content/303/5659/799.abstract webcite
- [25]Hartemink A, Gifford D, Jaakkola T, Young R: Bayesian methods for elucidating genetic regulatory networks. Intell Syst IEEE 2002, 17(2):37-43.
- [26]Reik W, Walter J: Genomic imprinting: parental influence on the genome. Nat Rev Genet 2001, 2:21-32. http://dx.doi.org/10.1038/35047554 webcite
- [27]Herman JG, Baylin SB: Gene silencing in cancer in association with promoter hypermethylation. New England J Med 2003, 349(21):2042-2054. http://www.ncbi.nlm.nih.gov/pubmed/14627790 webcite. [PMID: 14627790].
- [28]Yang B, Guo M, Herman JG, Clark DP: Aberrant promoter methylation profiles of tumor suppressor genes in hepatocellular carcinoma. Am J Pathol 2003, 163(3):1101-1107. http://ajp.amjpathol.org/cgi/content/abstract/163/3/1101 webcite
- [29]Bibikova M, Lin Z, Zhou L, Chudin E, Garcia EW, Wu B, Doucet D, Thomas NJ, Wang Y, Vollmer E, Goldmann T, Seifart C, Jiang W, Barker DL, Chee MS, Floros J, Fan J: High-throughput DNA methylation profiling using universal bead arrays. Genome Res 2006, 16(3):383-393. http://genome.cshlp.org/content/16/3/383.abstract webcite
- [30]Institute NC, Institute NHGR: The cancer genome Atlas. http://cancergenome.nih.gov/index.asp webcite
- [31]Leek JT, Scharpf RB, Bravo HC, Simcha D, Langmead B, Johnson WE, Geman D, Baggerly K, Irizarry RA: Tackling the widespread and critical impact of batch effects in high-throughput data. Nat Rev Genet 2010, 11(10):733-739. http://dx.doi.org/10.1038/nrg2825 webcite
- [32]Cover TM, Thomas JA: Elements of Information Theory. Hoboken: John Wiley and Sons; 2006.
- [33]Wilks SS: The large-sample distribution of the likelihood ratio for testing composite hypotheses. Ann Math Stat 1938, 9:60-62.
- [34]Marbach D, Schaffter T, Mattiussi C, Floreano D: Generating realistic in silico gene networks for performance assessment of reverse engineering methods. J Comput Biol J Comput Mol Cell Biol 2009, 16(2):229-239. http://www.ncbi.nlm.nih.gov/pubmed/19183003 webcite. [PMID: 19183003].
- [35]Prill RJ, Marbach D, Saez-Rodriguez J, Sorger PK, Alexopoulos LG, Xue X, Clarke ND, Altan-Bonnet G, Stolovitzky G: Towards a rigorous assessment of systems biology models: the DREAM3 challenges. PLoS ONE 2010, 5(2):e9202. http://dx.doi.org/10.1371/journal.pone.0009202 webcite
- [36]Marbach D, Prill RJ, Schaffter T, Mattiussi C, Floreano D, Stolovitzky G: Revealing strengths and weaknesses of methods for gene network inference. Proc Natl Acad Sci 2010, 107(14):6286-6291. http://www.pnas.org/content/107/14/6286.abstract webcite
- [37]Consortium F, Suzuki H, Forrest AR, van Nimwegen E, Daub CO, Balwierz PJ, Irvine KM, Lassmann T, Ravasi T, Hasegawa Y, de Hoon MJ, Katayama S, Schroder K, Carninci P, Tomaru Y, Katayama KM, Kubosaki A, Akalin A, Ando Y, Arner E, Asada M, Asahara H, Bailey T, Bajic VB, Bauer D, Beckhouse AG, Bertin N, Bjorkegren J, Brombacher F, Bulger E, et al.: The transcriptional network that controls growth arrest and differentiation in a human myeloid leukemia cell line. Nat Genet 2009, 41(5):553-562. http://dx.doi.org/10.1038/ng.375 webcite
- [38]Ogata H, Goto S, Sato K, Fujibuchi W, Bono H, Kanehisa M: KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res 1999, 27:29-34. http://dx.doi.org/10.1093/nar/27.1.29 webcite
- [39]Bansal M, Belcastro V, Ambesi-Impiombato A, di Bernardo D: How to infer gene networks from expression profiles. Mol Syst Biol 2007, 3:78. http://www.ncbi.nlm.nih.gov/pubmed/17299415 webcite. [PMID: 17299415].
PDF