期刊论文详细信息
BMC Systems Biology
Combining test statistics and models in bootstrapped model rejection: it is a balancing act
Gunnar Cedersund1  Peter Strålfors1  Rikard Johansson1 
[1]Department of Clinical and Experimental Medicine (IKE), Linköping University, Linköping, Sweden
关键词: Likelihood ratio;    Model Mimicry;    Insulin signaling;    2D;    Combining information;    Bootstrapping;    Model rejection;   
Others  :  866484
DOI  :  10.1186/1752-0509-8-46
 received in 2013-10-11, accepted in 2014-04-01,  发布年份 2014
PDF
【 摘 要 】

Background

Model rejections lie at the heart of systems biology, since they provide conclusive statements: that the corresponding mechanistic assumptions do not serve as valid explanations for the experimental data. Rejections are usually done using e.g. the chi-square test (χ2) or the Durbin-Watson test (DW). Analytical formulas for the corresponding distributions rely on assumptions that typically are not fulfilled. This problem is partly alleviated by the usage of bootstrapping, a computationally heavy approach to calculate an empirical distribution. Bootstrapping also allows for a natural extension to estimation of joint distributions, but this feature has so far been little exploited.

Results

We herein show that simplistic combinations of bootstrapped tests, like the max or min of the individual p-values, give inconsistent, i.e. overly conservative or liberal, results. A new two-dimensional (2D) approach based on parametric bootstrapping, on the other hand, is found both consistent and with a higher power than the individual tests, when tested on static and dynamic examples where the truth is known. In the same examples, the most superior test is a 2D χ2vsχ2, where the second χ2-value comes from an additional help model, and its ability to describe bootstraps from the tested model. This superiority is lost if the help model is too simple, or too flexible. If a useful help model is found, the most powerful approach is the bootstrapped log-likelihood ratio (LHR). We show that this is because the LHR is one-dimensional, because the second dimension comes at a cost, and because LHR has retained most of the crucial information in the 2D distribution. These approaches statistically resolve a previously published rejection example for the first time.

Conclusions

We have shown how to, and how not to, combine tests in a bootstrap setting, when the combination is advantageous, and when it is advantageous to include a second model. These results also provide a deeper insight into the original motivation for formulating the LHR, for the more general setting of nonlinear and non-nested models. These insights are valuable in cases when accuracy and power, rather than computational speed, are prioritized.

【 授权许可】

   
2014 Johansson et al.; licensee BioMed Central Ltd.

【 预 览 】
附件列表
Files Size Format View
20140727073653824.pdf 3679KB PDF download
73KB Image download
87KB Image download
74KB Image download
72KB Image download
111KB Image download
42KB Image download
131KB Image download
41KB Image download
【 图 表 】

【 参考文献 】
  • [1]Kitano H: Computational systems biology. Nature 2002, 420(6912):206-210.
  • [2]Cedersund G, Roll J: Systems biology: model based evaluation and comparison of potential explanations for given biological data. FEBS J 2009, 276:903-922.
  • [3]Brännmark C, Palmer R, Glad ST, Cedersund G, Strålfors P: Mass and information feedbacks through receptor endocytosis govern insulin signaling as revealed using a parameter-free modeling framework. J Biol Chem 2010, 285:20171-20179.
  • [4]Cedersund G: Conclusions via unique predictions obtained despite unidentifiability–new definitions and a general method. FEBS J 2012, 279(18):3513-3527.
  • [5]Popper KR: Conjectures and Refutations: The Growth of Scientific Knowledge. London: Routledge; 2002.
  • [6]Nyman E, Brannmark C, Palmer R, Brugard J, Nystrom FH, Strålfors P, Cedersund G: A hierarchical whole-body modeling approach elucidates the link between in Vitro insulin signaling and in Vivo glucose homeostasis. J Biol Chem 2011, 286(29):26028-26041.
  • [7]Nyman E, Fagerholm S, Jullesson D, Strålfors P, Cedersund G: Mechanistic explanations for counter-intuitive phosphorylation dynamics of the insulin receptor and insulin receptor substrate-1 in response to insulin in murine adipocytes. FEBS J 2012, 279(6):987-999.
  • [8]Schmidl D, Hug S, Li WB, Greiter MB, Theis FJ: Bayesian model selection validates a biokinetic model for zirconium processing in humans. BMC Syst Biol 2012, 6:95. BioMed Central Full Text
  • [9]Timmer J, Müller TG, Swameye I, Sandra O, Klingmüller U: Modeling the nonlinear dynamics of cellular signal transduction. Int J Bifurcation Chaos 2004, 14(6):2069-2079.
  • [10]Müller TG, Faller D, Timmer J, Swameye I, Sandra O, Klingmüller U: Tests for cycling in a signalling pathway. Appl Stat 2004, 53(4):557-558.
  • [11]Wagenmakers EJ, Ratcliff R, Gomez P, Iverson GJ: Assessing model mimicry using the parametric bootstrap. J Math Psychol 2004, 48:28-50.
  • [12]Melykuti B, August E, Papachristodoulou A, El-Samad H: Discriminating between rival biochemical network models: three approaches to optimal experiment design. BMC Syst Biol 2010, 4:38. BioMed Central Full Text
  • [13]Roberts MA, August E, Hamadeh A, Maini PK, McSharry PE, Armitage JP, Papachristodoulou A: A model invalidation-based approach for elucidating biological signalling pathways, applied to the chemotaxis pathway in R. sphaeroides. BMC Syst Biol 2009, 3:105. BioMed Central Full Text
  • [14]Ljung L (Ed): System Identification (2nd Ed.): Theory for the User. Upper Saddle River, NJ, USA: Prentice Hall PTR; 1999.
  • [15]Vuong QH: Likelihood ratio tests for model selection and non-nested hypotheses. Econometrica 1989, 57(2):307-333.
  • [16]Cox DR: Tests of separate families of hypotheses. Proc 4th Berkeley Symp Math Stat Probab 1961, 1:105-123.
  • [17]Cox DR: Further results on tests of separate families of hypotheses. J R Stat Soc Series B (Methodol) 1962, 24(2):406-424.
  • [18]Sheskin DJ: Handbook of Parametric and Nonparametric Statistical Procedures. London: A Chapman & Hall book, Chapman & Hall/CRC; 2011.
  • [19]Chernoff H: On the distribution of the likelihood Ratio. Ann Math Stat 1954, 25(3):573-587.
  • [20]Chant D: On asymptotic tests of composite hypotheses in nonstandard conditions. Biometrika 1974, 61(2):291-298.
  • [21]Miller JJ: Asymptotic properties of maximum likelihood estimates in the mixed model of the analysis of variance. Ann Stat 1977, 5(4):746-762.
  • [22]Shapiro A: Asymptotic distribution of test statistics in the analysis of moment structures under inequality constraints. Biometrika 1985, 72(1):133-144.
  • [23]Self SG, Liang K-Y: Asymptotic properties of maximum likelihood estimators and likelihood ratio tests under nonstandard conditions. J Am Stat Assoc 1987, 82(398):605-610.
  • [24]Kanji GK: 100 Statistical Tests. Thousand Oaks, California, US: SAGE Publications; 2006.
  • [25]Williams DA: Discrimination between regression models to determine the pattern of enzyme synthesis in synchronous cell cultures. Biometrics 1970, 26:23-32.
  • [26]Efron B: Bootstrap methods: another look at the Jackknife. Ann Stat 1979, 7(1):1-26.
  • [27]Efron B: The Jackknife, the Bootstrap, and Other Resampling Plans (CBMS-NSF Regional Conference Series in Applied Mathematics). Montpelier, Vermont, USA: Society for Industrial Mathematics; 1987.
  • [28]Kerr MK, Churchill GA: Bootstrapping cluster analysis: assessing the reliability of conclusions from microarray experiments. Proc Natl Acad Sci USA 2001, 98(16):8961-8965.
  • [29]Kirk PD, Stumpf MP: Gaussian process regression bootstrapping: exploring the effects of uncertainty in time course data. Bioinformatics 2009, 25(10):1300-1306.
  • [30]Felsenstein J: Confidence limits on phylogenies: an approach using the bootstrap. Evolution 1985, 39(4):783-791.
  • [31]Efron B, Halloran E, Holmes S: Bootstrap confidence levels for phylogenetic trees. Proc Natl Acad Sci USA 1996, 93(14):7085-7090.
  • [32]Lanfear R, Bromham L: Statistical tests between competing hypotheses of Hox cluster evolution. Syst Biol 2008, 57(5):708-718.
  • [33]Hinde J: Choosing between nonnested models: a simulation approach. In Advances in GLIM and Statistical Modelling. Proceedings of the Glim92 Conference. Edited by Fahrmeir L. Munich, Germany: Springer-Verlag; 1992.
  • [34]National-Research-Council-(US): Combining Information: Statistical Issues and Opportunities for Research. Contemporary statistics. Washington DC: National Academy Press; 1992.
  • [35]Bailey TL, Gribskov M: Combining evidence using p-values: application to sequence homology searches. Bioinformatics 1998, 14(1):48-54.
  • [36]Louv WC, Littell RC: Combining one-sided binomial tests. J Am Stat Assoc 1986, 81(394):550-554.
  • [37]Wilkinson B: A statistical consideration in psychological research. Psychol Bull 1951, 48(3):156-158.
  • [38]Hubner K, Sahle S, Kummer U: Applications and trends in systems biology in biochemistry. FEBS J 2011, 278(16):2767-2857.
  • [39]Heinrich R, Schuster S: The Regulation of Cellular Systems. London: Chapman & Hall; 1996.
  • [40]MATLAB: Version 7.13.0.564 (R2011b). Natick, Massachusetts: The MathWorks Inc.; 2011.
  • [41]Schmidt H, Jirstrand M: Systems biology toolbox for MATLAB: a computational platform for research in systems biology. Bioinformatics 2006, 22:514-515.
  • [42]Silverman BW: Density Estimation for Statistics and Data Analysis. Monographs on applied probability and statistics. London: Chapman and Hall; 1986.
  • [43]Cao Y: Bivariant Kernel Density Estimation (V2.0). The MathWorks, Inc; 2008. http://www.mathworks.com/matlabcentral/fileexchange/19280-bivariant-kernel-density-estimation-v2-0/content/gkde2.m webcite
  • [44]Hastie TJ, Tibshirani RJ, Friedman JJH: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer Series in Statistics. Munich, Germany: Springer; 2001.
  • [45]Cedersund G, Roll J, Ulfhielm E, Danielsson A, Tidefelt H, Strålfors P: Model-based hypothesis testing of key mechanisms in initial phase of insulin signaling. PLoS Comput Biol 2008, 4:1000096.
  • [46]Akaike H: A new look at the statistical model identification. IEEE Trans Automatic Control 1974, 19(6):716-723.
  • [47]Akaike H: Modern development of statistical methods. In Trends and Progress in System Identification. Edited by Eykoff P. New York: Pergamon Press; 1981.
  • [48]Neyman J, Pearson ES: On the use and interpretation of certain test criteria for purposes of statistical inference. Biometrika 1928, 20A(1-2):175-240.
  • [49]Godfrey LG: On the asymptotic validity of a bootstrap method for testing nonnested hypotheses. Econ Lett 2007, 94(3):408-413.
  • [50]Bollback JP: Bayesian model adequacy and choice in phylogenetics. Mol Biol Evol 2002, 19(7):1171-1180.
  • [51]Box GEP, Tiao GC: Bayesian Inference in Statistical Analysis. Wiley Classics Library. New York: Wiley; 2011.
  • [52]Apgar JF, Toettcher JE, Endy D, White FM, Tidor B: Stimulus design for model selection and validation in cell signaling. PLoS Comput Biol 2008, 4(2):30.
  • [53]Dochain D, Vanrolleghem P: Dynamical Modelling and Estimation in Wastewater Treatment Processes. London: IWA Publishing; 2001.
  • [54]Wilks SS: The large-sample distribution of the likelihood ratio for testing composite hypotheses. Ann Math Stat 1938, 9(1):60-62.
  • [55]Hall P, Wilson SR: Two guidelines for bootstrap hypothesis testing. Biometrics 1991, 47(2):757-762.
  • [56]Geyer CJ: Practical Markov chain Monte Carlo. Stat Sci 1992, 7(4):473-483.
  • [57]Xu TR, Vyshemirsky V, Gormand A, von Kriegsheim A, Girolami M, Baillie GS, Ketley D, Dunlop AJ, Milligan G, Houslay MD, Kolch W: Inferring signaling pathway topologies from multiple perturbation measurements of specific biochemical species. Sci Signal 2010, 3(134):20.
  • [58]Vyshemirsky V, Girolami MA: Bayesian ranking of biochemical system models. Bioinformatics 2008, 24(6):833-839.
  • [59]Toni T, Welch D, Strelkowa N, Ipsen A, Stumpf MP: Approximate Bayesian computation scheme for parameter inference and model selection in dynamical systems. J R Soc Interface 2009, 6(31):187-202.
  • [60]Raue A, Kreutz C, Maiwald T, Bachmann J, Schilling M, Klingmuller U, Timmer J: Structural and practical identifiability analysis of partially observed dynamical models by exploiting the profile likelihood. Bioinformatics 2009, 25(15):1923-1929.
  • [61]Rubin DB: Bayesianly justifiable and relevant frequency calculations for the applied statistician. Ann Stat 1984, 12(4):1151-1172.
  文献评价指标  
  下载次数:3次 浏览次数:34次