BMC Systems Biology | |
Validation and selection of ODE based systems biology models: how to arrive at more reliable decisions | |
Age K. Smilde1  Huub C.J Hoefsloot1  Dicle Hasdemir1  | |
[1] Netherlands Metabolomics Centre, Leiden, The Netherlands | |
关键词: Hold-out validation; Cross validation; Model selection; Model validation; Differential equations; ODE; Kinetic models; | |
Others : 1230660 DOI : 10.1186/s12918-015-0180-0 |
|
received in 2014-12-17, accepted in 2015-06-16, 发布年份 2015 |
【 摘 要 】
Background
Most ordinary differential equation (ODE) based modeling studies in systems biology involve a hold-out validation step for model validation. In this framework a pre-determined part of the data is used as validation data and, therefore it is not used for estimating the parameters of the model. The model is assumed to be validated if the model predictions on the validation dataset show good agreement with the data. Model selection between alternative model structures can also be performed in the same setting, based on the predictive power of the model structures on the validation dataset. However, drawbacks associated with this approach are usually under-estimated.
Results
We have carried out simulations by using a recently published High Osmolarity Glycerol (HOG) pathway from S.cerevisiae to demonstrate these drawbacks. We have shown that it is very important how the data is partitioned and which part of the data is used for validation purposes. The hold-out validation strategy leads to biased conclusions, since it can lead to different validation and selection decisions when different partitioning schemes are used. Furthermore, finding sensible partitioning schemes that would lead to reliable decisions are heavily dependent on the biology and unknown model parameters which turns the problem into a paradox. This brings the need for alternative validation approaches that offer flexible partitioning of the data. For this purpose, we have introduced a stratified random cross-validation (SRCV) approach that successfully overcomes these limitations.
Conclusions
SRCV leads to more stable decisions for both validation and selection which are not biased by underlying biological phenomena. Furthermore, it is less dependent on the specific noise realization in the data. Therefore, it proves to be a promising alternative to the standard hold-out validation strategy.
【 授权许可】
2015 Hasdemir et al.
Files | Size | Format | View |
---|---|---|---|
Fig. 15. | 42KB | Image | download |
Fig. 14. | 119KB | Image | download |
Fig. 13. | 33KB | Image | download |
Fig. 12. | 69KB | Image | download |
Fig. 11. | 53KB | Image | download |
Fig. 10. | 35KB | Image | download |
Fig. 9. | 80KB | Image | download |
Fig. 8. | 69KB | Image | download |
Figure 2. | 28KB | Image | download |
Fig. 6. | 45KB | Image | download |
Fig. 5. | 74KB | Image | download |
Fig. 4. | 45KB | Image | download |
Fig. 3. | 45KB | Image | download |
Fig. 2. | 40KB | Image | download |
Fig. 1. | 39KB | Image | download |
Fig. 15. | 42KB | Image | download |
Fig. 14. | 119KB | Image | download |
Fig. 13. | 33KB | Image | download |
Fig. 12. | 69KB | Image | download |
Fig. 11. | 53KB | Image | download |
Fig. 10. | 35KB | Image | download |
Fig. 9. | 80KB | Image | download |
Fig. 8. | 69KB | Image | download |
Fig. 7. | 13KB | Image | download |
Fig. 6. | 45KB | Image | download |
Fig. 5. | 74KB | Image | download |
Fig. 4. | 45KB | Image | download |
Fig. 3. | 45KB | Image | download |
Fig. 2. | 40KB | Image | download |
Fig. 1. | 39KB | Image | download |
【 图 表 】
Fig. 1.
Fig. 2.
Fig. 3.
Fig. 4.
Fig. 5.
Fig. 6.
Fig. 7.
Fig. 8.
Fig. 9.
Fig. 10.
Fig. 11.
Fig. 12.
Fig. 13.
Fig. 14.
Fig. 15.
Fig. 1.
Fig. 2.
Fig. 3.
Fig. 4.
Fig. 5.
Fig. 6.
Figure 2.
Fig. 8.
Fig. 9.
Fig. 10.
Fig. 11.
Fig. 12.
Fig. 13.
Fig. 14.
Fig. 15.
【 参考文献 】
- [1]Link H, Kochanowski K, Sauer U: Systematic identification of allosteric protein-metabolite interactions that control enzyme activity in vivo. Nat Biotechnol 2013, 31(4):357-61.
- [2]Marucci L, Santini S, di Bernardo M, di Bernardo D: Derivation, identification and validation of a computational model of a novel synthetic regulatory network in yeast. J Math Biol 2011, 62(5):685-706.
- [3]Maiwald T, Timmer J: Dynamical modeling and multi-experiment fitting with potterswheel. Bioinformatics 2008, 24(18):2037-043.
- [4]Schaber J, Klipp E: Model-based inference of biochemical parameters and dynamic properties of microbial signal transduction networks. Curr Opin Biotechnol 2011, 22(1):109-16.
- [5]Kirk PDW, Stumpf MPH. Gaussian process regression bootstrapping: exploring the effects of uncertainty in time course data. Bioinformatics (Oxford, England). 2009; 25(10):1300–6. doi:http://dx.doi.org/10.1093/bioinformatics/btp139.
- [6]Joshi M, Seidel-Morgenstern A, Kremling A. Exploiting the bootstrap method for quantifying parameter confidence intervals in dynamical systems. Metab Eng. 2006; 8(5):447–55. doi:http://dx.doi.org/10.1016/j.ymben.2006.04.003.
- [7]du Preez FB, van Niekerk DD, Kooi B, Rohwer JM, Snoep JL: From steady-state to synchronized yeast glycolytic oscillations i: model construction. FEBS J 2012, 279(16):2810-822.
- [8]du Preez FB, van Niekerk DD, Snoep JL: From steady-state to synchronized yeast glycolytic oscillations ii: model validation. FEBS J 2012, 279(16):2823-836.
- [9]Klipp E, Liebermeister W, Wierling C, Kowald A, Lehrach H, Herwig R: Systems Biology: A Textbook. WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim; 2009.
- [10]Cedersund G, Roll J: Systems biology: model based evaluation and comparison of potential explanations for given biological data. FEBS J 2009, 276(4):903-22.
- [11]Müller T, Faller D, Timmer J, Swameye I, Sandra O, Klingmüller U: Tests for cycling in a signalling pathway. J R Stat Soc: Ser C: Appl Stat 2004, 53(4):557-68.
- [12]Williams DA: Discrimination between regression models to determine the pattern of enzyme synthesis in synchronous cell cultures. Biometrics 1970, 26:23-32.
- [13]Johansson R, Strålfors P, Cedersund G: Combining test statistics and models in bootstrapped model rejection: it is a balancing act. BMC Syst Biol 2014, 8(1):46. BioMed Central Full Text
- [14]Akaike H: A new look at the statistical model identification. Automatic Control IEEE Trans 1974, 19(6):716-23.
- [15]Schwarz G, et al.: Estimating the dimension of a model. Ann Stat 1978, 6(2):461-4.
- [16]Kadam KL, Rydholm EC, McMillan JD: Development and validation of a kinetic model for enzymatic saccharification of lignocellulosic biomass. Biotechnol Prog 2004, 20(3):698-705.
- [17]Efron B, Tibshirani RJ: An Introduction to the Bootstrap. CRC press LLC, Florida; 1994.
- [18]Stone M: Cross-validatory choice and assessment of statistical predictions. J R Stat Soc Ser B Methodol 1974, 36:111-47.
- [19]Kohavi R, et al. A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Ijcai: 1995. p. 1137–45.
- [20]Weiss SM: Small sample error rate estimation for k-nn classifiers. IEEE Trans Pattern Anal Mach Intell 1991, 13(3):285-9.
- [21]Braga-Neto UM, Dougherty ER: Is cross-validation valid for small-sample microarray classification? Bioinformatics 2004, 20(3):374-80.
- [22]Breiman L, Spector P: Submodel selection and evaluation in regression. the x-random case. International statistical review/revue internationale de Statistique 1992, 60:291-319.
- [23]Hasdemir D, Hoefsloot HC, Westerhuis JA, Smilde AK. How informative is your kinetic model?: using resampling methods for model invalidation. BMC Syst Biol. 2014; 8(1):61. doi:http://dx.doi.org/10.1186/1752-0509-8-61.
- [24]Kuepfer L, Peter M, Sauer U, Stelling J. Ensemble modeling for analysis of cell signaling dynamics. Nat Biotechnol. 2007; 25(9):1001–6. doi:http://dx.doi.org/10.1038/nbt1330.
- [25]Schaber J, Baltanas R, Bush A, Klipp E, Colman-Lerner A. Modelling reveals novel roles of two parallel signalling pathways and homeostatic feedbacks in yeast. Mol Syst Biol. 2012; 8(622):622. doi:http://dx.doi.org/10.1038/msb.2012.53.
- [26]Le Novere N, Bornstein B, Broicher A, Courtot M, Donizelli M, Dharuri H, et al.: Biomodels database: a free, centralized database of curated, published, quantitative kinetic models of biochemical and cellular systems. Nucleic acids research 2006, 34(suppl 1):689-91.
- [27]Coleman TF, Li Y: On the convergence of interior-reflective newton methods for nonlinear minimization subject to bounds. Math Prog 1994, 67(1-3):189-224.
- [28]Coleman TF, Li Y: An interior trust region approach for nonlinear minimization subject to bounds. SIAM J Optim 1996, 6(2):418-45.