BMC Systems Biology | |
Joint estimation of causal effects from observational and intervention gene expression data | |
Grégory Nuel2  Florence Jaffrézic1  Andrea Rau1  | |
[1] AgroParisTech, UMR1313 Génétique animale et biologie intégrative, 75231 Paris 05, France;Sorbonne Paris Cité, Paris, France | |
关键词: Maximum likelihood; Metropolis-Hastings; Intervention calculus; Gaussian Bayesian network; Causal inference; | |
Others : 1141968 DOI : 10.1186/1752-0509-7-111 |
|
received in 2013-07-30, accepted in 2013-10-07, 发布年份 2013 | |
【 摘 要 】
Background
In recent years, there has been great interest in using transcriptomic data to infer gene regulatory networks. For the time being, methodological development in this area has primarily made use of graphical Gaussian models for observational wild-type data, resulting in undirected graphs that are not able to accurately highlight causal relationships among genes. In the present work, we seek to improve the estimation of causal effects among genes by jointly modeling observational transcriptomic data with arbitrarily complex intervention data obtained by performing partial, single, or multiple gene knock-outs or knock-downs.
Results
Using the framework of causal Gaussian Bayesian networks, we propose a Markov chain Monte Carlo algorithm with a Mallows proposal model and analytical likelihood maximization to sample from the posterior distribution of causal node orderings, and in turn, to estimate causal effects. The main advantage of the proposed algorithm over previously proposed methods is its flexibility to accommodate any kind of intervention design, including partial or multiple knock-out experiments. Using simulated data as well as data from the Dialogue for Reverse Engineering Assessments and Methods (DREAM) 2007 challenge, the proposed method was compared to two alternative approaches: one requiring a complete, single knock-out design, and one able to model only observational data.
Conclusions
The proposed algorithm was found to perform as well as, and in most cases better, than the alternative methods in terms of accuracy for the estimation of causal effects. In addition, multiple knock-outs proved to contribute valuable additional information compared to single knock-outs. Finally, the simulation study confirmed that it is not possible to estimate the causal ordering of genes from observational data alone. In all cases, we found that the inclusion of intervention experiments enabled more accurate estimation of causal regulatory relationships than the use of wild-type data alone.
【 授权许可】
2013 Rau et al.; licensee BioMed Central Ltd.
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
20150327180707289.pdf | 550KB | download | |
Figure 4. | 50KB | Image | download |
Figure 3. | 52KB | Image | download |
Figure 2. | 60KB | Image | download |
Figure 1. | 53KB | Image | download |
【 图 表 】
Figure 1.
Figure 2.
Figure 3.
Figure 4.
【 参考文献 】
- [1]Friedman J, Hastie T, Tibshirani R: Sparse inverse covariance estimation with the graphical lasso. Biostatistics 2008, 9(3):432-441.
- [2]Spirtes P, Glymour C, Scheines R: Causation, Prediction, and Search. Cambridge, MA, USA: The MIT Press; 2001.
- [3]Pearl J: The logic of counterfactuals in causal inference. J Am Stat Assoc 2000, 95:428-435.
- [4]Cooper G, Yoo C: Causal discovery from a mixture of expeirmental and observational data. In Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence. San Francisco, CA: Morgan Kaufmann; 1999.
- [5]Ellis B, Wong W: Learning causal Bayesian network structures from experimental data. J Am Stat Assoc 2008, 103(482):778-789.
- [6]Maathuis M, Colombo D, Kalisch M, Bühlmann P: Predicting causal effects in large-scale systems from observational data. Nat Methods 2010, 7:247-248.
- [7]Maathuis M, Kalisch M, Bühlmann P: Estimating high-dimensional intervention effects from observational data. Ann Stat 2009, 37:3133-3164.
- [8]Kalisch M, Mächler M, Colombo D, Maathuis M, Bühlmann P: Causal inference using graphical models with the R package pcalg. J Stat Softw 2012, 47(11):1-26.
- [9]Kalisch M, Bühlmann P: Estimating high-dimensional directed acyclic graphs with the PC-algorithm. J Mach Learn Res 2007, 8:613-636.
- [10]Pearl J: Causality: Models, Reasoning and Inference. New York, NY, USA: Cambridge University Press; 2000.
- [11]Pinna A, Soranzo N, de la Fuente A: From knockouts to networks: establishing direct cause-effect relationships through graph analysis. PLoS ONE 2010, 10(5):e12912.
- [12]Pinna A, Heise S, Flassig R, Klamt S, de la Fuente A: Reconstruction of large-scale regulatory networks based on perturbation graphs and transitive reduction: improved methods and their evaluation. BMC Syst Biol 2013, 7:73. BioMed Central Full Text
- [13]Stolovitzky G, Monroe D, Califano A: Dialogue on reverse-engineering assessment and methods: The DREAM of high-throughput pathway inference. Ann N Y Acad Sci 2007, 1115:1-22.
- [14]Marbach D, Schaffter CTandMatiussi, Floreano D: Generating realistic in silico gene networks for performance assessment of reverse engineering methods. J Comput Biol 2009, 16(2):229-239.
- [15]Marbach D, Prill RJ, Schaffter T, MAttiussi C, Floreano D, Stolovitzky G: Revealing strengths and weaknesses of methods for gene network inferene. PNAS 2010, 107(14):6286-6291.
- [16]Prill RJ, Marbach D, Saez-Rodriguez J, Sorger PK, Alexopoulos LG, Xue X, Clarke ND, Altan-Bonnet G, Stolovitzky G: Towards a rigourous assessment of systems biology models: the DREAM3 challenges. PLoS ONE 2010., 5(e9202)
- [17]Küffner R, Petri T, Windhager L, Zimmer R: Petri nets with fuzzy logic (PNFL): reverse engineering and parametrization. PLoS ONE 2010, 5(9):e12807.
- [18]Metropolis N, Rosenbluth A, Rosenbluth M, Teller A, Teller E: Equations of state calculations by fast computing machines. J Chem Phys 1953, 21(6):1087-1092.
- [19]Hastings W: Monte Carlo sampling methods using Markov chains and their applications. Biometrika 1970, 57:97-109. [http://dx.doi.org/10.1093/biomet/57.1.97 webcite]
- [20]Mallows C: Non-null ranking models. Biometrika 1957, 44:114-130.
- [21]Doignon J, Pekec A, Regenwetter M: The repeated insertion model for rankings: Missing link between two subset choice models. Psychometrika 2004, 69:33-54.
- [22]Nuel G, Rau A, Jaffrézic F: Joint likelihood calculation for intervention and observational data from a Gaussian Bayesian network. 2013,. arXiv:1305.0709v4
- [23]Roberts G, Gelman A, Gilks W: Weak convergence and optimal scaling of random walk Metropolis algorithms. Ann Appl Probability 1997, 7:110-120.
- [24]Fu F, Zhou Q: Learning sparse causal Gaussian networks with experimental intervention: regularization and coordinate descent. J Am Stat Assoc 2013, 108(501):288-300.
- [25]Hauser A, Bühlmann P: Two optimal strategies for active learning of causal models from interventions. Proc. of the 6th European Workshop on Probabilistic Graphical Models 2012, 123-130.