| BMC Bioinformatics | |
| A strategy to build and validate a prognostic biomarker model based on RT-qPCR gene expression and clinical covariates | |
| Maud Tournoud1  Audrey Larue1  Marie-Angelique Cazalis2  Fabienne Venet3  Alexandre Pachot2  Guillaume Monneret3  Alain Lepape3  Jean-Baptiste Veyrieras1  | |
| [1] Bioinformatics Research Department, bioMérieux, Marcy L’Etoile, France | |
| [2] Medical Diagnostic Discovery Department, bioMérieux, Marcy L’Etoile, France | |
| [3] Laboratoire Commun de Recherche, Hospices Civils de Lyon, Lyon, France | |
| 关键词: Model optimism; Performance estimation; Cross-validation; RT-qPCR gene expression measurement; Prognostic survival model; | |
| Others : 1160559 DOI : 10.1186/s12859-015-0537-9 |
|
| received in 2014-10-02, accepted in 2015-03-13, 发布年份 2015 | |
PDF
|
|
【 摘 要 】
Background
Construction and validation of a prognostic model for survival data in the clinical domain is still an active field of research. Nevertheless there is no consensus on how to develop routine prognostic tests based on a combination of RT-qPCR biomarkers and clinical or demographic variables. In particular, the estimation of the model performance requires to properly account for the RT-qPCR experimental design.
Results
We present a strategy to build, select, and validate a prognostic model for survival data based on a combination of RT-qPCR biomarkers and clinical or demographic data and we provide an illustration on a real clinical dataset. First, we compare two cross-validation schemes: a classical outcome-stratified cross-validation scheme and an alternative one that accounts for the RT-qPCR plate design, especially when samples are processed by batches. The latter is intended to limit the performance discrepancies, also called the validation surprise, between the training and the test sets. Second, strategies for model building (covariate selection, functional relationship modeling, and statistical model) as well as performance indicators estimation are presented. Since in practice several prognostic models can exhibit similar performances, complementary criteria for model selection are discussed: the stability of the selected variables, the model optimism, and the impact of the omitted variables on the model performance.
Conclusion
On the training dataset, appropriate resampling methods are expected to prevent from any upward biases due to unaccounted technical and biological variability that may arise from the experimental and intrinsic design of the RT-qPCR assay. Moreover, the stability of the selected variables, the model optimism, and the impact of the omitted variables on the model performances are pivotal indicators to select the optimal model to be validated on the test dataset.
【 授权许可】
2015 Tournoud et al.; licensee BioMed Central.
【 预 览 】
| Files | Size | Format | View |
|---|---|---|---|
| 20150411015132673.pdf | 3019KB | ||
| Figure 6. | 19KB | Image | |
| Figure 5. | 96KB | Image | |
| Figure 4. | 56KB | Image | |
| Figure 3. | 88KB | Image | |
| Figure 2. | 45KB | Image | |
| Figure 1. | 74KB | Image |
【 图 表 】
Figure 1.
Figure 2.
Figure 3.
Figure 4.
Figure 5.
Figure 6.
【 参考文献 】
- [1]van Houwelingen HC: Validation, calibration, revision and combination of prognostic survival models. Stat Med. 2000, 19:3401-15.
- [2]Steyerberg E: Clinical Prediction Models. A Practical Approach to Development, Validation and Updating. Springer, New York; 2009.
- [3]Mallett S, Royston P, Waters R, Dutton S, Altman DG: Reporting performance of prognostic models in cancer: a review. BMC Med. 2010, 8(1):21. BioMed Central Full Text
- [4]Harrell F, Lee KL, Mark DB: Tutorial in biostatistics multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat Med. 1996, 15:361-87.
- [5]Royston P, Parmar MK, Sylvester R: Construction and validation of a prognostic model across several studies, with an application in superficial bladder cancer. Stat Med. 2004, 23(6):907-26.
- [6]Altman DG, Royston P: What do we mean by validating a prognostic model? Stat Med. 2000, 19(4):453-73.
- [7]Altman DG: Prognostic models: a methodological framework and review of models for breast cancer. Cancer Invest. 2009, 27(3):235-43.
- [8]Simon RM, Subramanian J, Li M-C, Menezes S: Using cross-validation to evaluate predictive accuracy of survival risk classifiers based on high-dimensional data. Brief Bioinf. 2011, 12(3):203-14.
- [9]Gerds TA, Schumacher M: Efron-type measures of prediction error for survival analysis. Biometrics 2007, 63(4):1283-7.
- [10]Parker BJ, Günter S, Bedo J: Stratification bias in low signal microarray studies. BMC Bioinf. 2007, 8(1):326. BioMed Central Full Text
- [11]Subramanian J, Simon R: An evaluation of resampling methods for assessment of survival risk prediction in high-dimensional settings. Stat Med. 2011, 30(6):642-53.
- [12]World Medical Association Declaration of Helsinki Ethical Principles for Medical Research Involving Human Subjects. http://www.wma.net/en/30publications/10policies/b3/17c.pdf.
- [13]Concato J, Peduzzi P, Holford TR, Feinstein AR: Importance of events per independent variable in proportional hazards analysis i. background, goals, and general strategy. J Clin Epidemiol. 1995, 48(12):1495-501.
- [14]Vittinghoff E, McCulloch CE: Relaxing the rule of ten events per variable in logistic and cox regression. Am J Epidemiol. 2007, 165(6):710-8.
- [15]Dobbin KK, Song X: Sample size requirements for training high-dimensional risk predictors. Biostatistics 2013, 14(4):639-52.
- [16]Schoenfeld DA. Sample-size formula for the proportional-hazards regression model. 1983; 39(2):499–503.
- [17]Wong ML, Medrano JF: Real-time pcr for mrna quantitation. Biotechniques 2005, 39(1):75.
- [18]Hellemans J, Mortier G, De Paepe A, Speleman F, Vandesompele J: qbase relative quantification framework and software for management and automated analysis of real-time quantitative pcr data. Genome Biol. 2007, 8(2):19. BioMed Central Full Text
- [19]Akaike H: Information theory and an extension of the maximum likelihood principle. In Selected Papers of Hirotugu Akaike. Springer, New York; 1998.
- [20]Tibshirani R: The lasso method for variable selection in the cox model. Stat Med. 1997, 16(4):385-95.
- [21]Zhang HH, Lu W: Adaptive lasso for cox’s proportional hazards model. Biometrika 2007, 94(3):691-703.
- [22]Fan J, Li R: Variable selection for cox’s proportional hazards model and frailty model. Ann Stat. 2002, 30(1):74-99.
- [23]Royston P, Altman DG: Regression using fractional polynomials of continuous covariates: parsimonious parametric modelling. Appl Stat. 1994, 43(3):429-67.
- [24]Verweij PJ, Van Houwelingen HC: Penalized likelihood in cox regression. Stat Med. 1994, 13(23-24):2427-36.
- [25]Harrell FE: Regression Modeling Strategies: with Applications to Linear Models, Logistic Regression, and Survival Analysis. Springer, New York; 2001.
- [26]Graf E, Schmoor C, Sauerbrei W, Schumacher M: Assessment and comparison of prognostic classification schemes for survival data. Stat Med. 1999, 18(17-18):2529-45.
- [27]Breslow N: Covariance analysis of censored survival data. Biometrics 1974, 30(1):89-99.
- [28]Heagerty PJ, Lumley T, Pepe MS: Time-dependent roc curves for censored survival data and a diagnostic marker. Biometrics 2000, 56(2):337-44.
- [29]Abeel T, Helleputte T, Van de Peer Y, Dupont P, Saeys Y: Robust biomarker identification for cancer diagnosis with ensemble feature selection methods. Bioinformatics 2010, 26(3):392-8.
- [30]Kuncheva LI: A stability index for feature selection. In Proceedings of the Artificial Intelligence and Application 2007 conference. Edited by Devedzic V. ACTA Press, Calgary, Canada; 2007.
- [31]Sauerbrei W, Schumacher M: A bootstrap resampling procedure for model building: application to the cox regression model. Stat Med. 1992, 11(16):2093-109.
- [32]Therneau TM: Modeling Survival Data: Extending the Cox Model. Springer, New York; 2000.
- [33]Vincent J-L, De Mendonça A, Cantraine F, Moreno R, Takala J, Suter PM, et al.: Use of the sofa score to assess the incidence of organ dysfunction/failure in intensive care units: results of a multicenter, prospective study. Crit Care Med. 1998, 26(11):1793-800.
- [34]Dai H, Charnigo R, Vyhlidal CA, Jones BL, Bhandary M: Mixed modeling and sample size calculations for identifying housekeeping genes. Stat Med. 2013, 32(18):3115-25.
- [35]Efron B, Tibshirani R: Improvements on cross-validation: the 632+ bootstrap method. J Am Stat Assoc. 1997, 92(438):548-60.
- [36]Van Buuren S, Boshuizen HC, Knook DL: Multiple imputation of missing blood pressure covariates in survival analysis. Stat Med. 1999, 18(6):681-94.
- [37]Little RJ, Rubin DB: Statistical Analysis with Missing Data. Wiley, New Jersey; 2002.
- [38]Wood AM, White IR, Royston P: How should variable selection be performed with multiply imputed data? Stat Med. 2008, 27(17):3227-46.
- [39]Musoro JZ, Zwinderman AH, Puhan MA, ter Riet G, Geskus RB: Validation of prediction models based on lasso regression with multiply imputed data. BMC Med Res Methodology 2014, 14(1):116. BioMed Central Full Text
- [40]Chen Q, Wang S: Variable selection for multiply-imputed data with application to dioxin exposure study. Stat Med. 2013, 32(21):3646-59.
PDF