BMC Medical Research Methodology | |
Added predictive value of omics data: specific issues related to validation illustrated by two case studies | |
Anne-Laure Boulesteix2  Tobias Herold1  Riccardo De Bin2  | |
[1] Clinical Cooperative Group Leukemia, Helmholtz Center Munich for Environmental Health, Marchioninistr. 15, 81377 München, Germany;Department of Medical Informatics, Biometry and Epidemiology, Ludwig-Maximilians-Universität, Marchioninistr. 15, 81377 München, Germany | |
关键词: Validation; Time-to-event data; Prediction model; Omics score; Added predictive value; | |
Others : 1090670 DOI : 10.1186/1471-2288-14-117 |
|
received in 2014-06-06, accepted in 2014-09-18, 发布年份 2014 | |
【 摘 要 】
Background
In the last years, the importance of independent validation of the prediction ability of a new gene signature has been largely recognized. Recently, with the development of gene signatures which integrate rather than replace the clinical predictors in the prediction rule, the focus has been moved to the validation of the added predictive value of a gene signature, i.e. to the verification that the inclusion of the new gene signature in a prediction model is able to improve its prediction ability.
Methods
The high-dimensional nature of the data from which a new signature is derived raises challenging issues and necessitates the modification of classical methods to adapt them to this framework. Here we show how to validate the added predictive value of a signature derived from high-dimensional data and critically discuss the impact of the choice of methods on the results.
Results
The analysis of the added predictive value of two gene signatures developed in two recent studies on the survival of leukemia patients allows us to illustrate and empirically compare different validation techniques in the high-dimensional framework.
Conclusions
The issues related to the high-dimensional nature of the omics predictors space affect the validation process. An analysis procedure based on repeated cross-validation is suggested.
【 授权许可】
2014 De Bin et al.; licensee BioMed Central Ltd.
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
20150128162523975.pdf | 747KB | download | |
Figure 11. | 37KB | Image | download |
Figure 10. | 49KB | Image | download |
Figure 9. | 46KB | Image | download |
Figure 8. | 31KB | Image | download |
20150413033145814.pdf | 421KB | download | |
Figure 6. | 39KB | Image | download |
Figure 5. | 33KB | Image | download |
Figure 4. | 48KB | Image | download |
Figure 3. | 112KB | Image | download |
Figure 2. | 52KB | Image | download |
Figure 1. | 107KB | Image | download |
【 图 表 】
Figure 1.
Figure 2.
Figure 3.
Figure 4.
Figure 5.
Figure 6.
Figure 8.
Figure 9.
Figure 10.
Figure 11.
【 参考文献 】
- [1]Simon R: Development and validation of therapeutically relevant multi-gene biomarker classifiers. J Nat Cancer Inst 2005, 97:866-867.
- [2]Buyse M, Loi S, Van’t Veer L, Viale G, Delorenzi M, Glas AM, d’Assignies MS, Bergh J, Lidereau R, Ellis P, Harris A, Bogaerts J, Therasse P, Floore A, Amakrane M, Piette F, Rutgers E, Sotiriou C, Cardoso F, Piccart MJ: Validation and clinical utility of a 70-gene prognostic signature for women with node-negative breast cancer. J Nat Cancer Inst 2006, 98:1183-1192.
- [3]George S: Statistical issues in translational cancer research. Clin Cancer Res 2008, 14:5954-5958.
- [4]Ioannidis JPA: Expectations, validity, and reality in omics. J Clin Epidemiol 2010, 63:960-963.
- [5]Mischak H, Allmaier G, Apweiler R, Attwood T, Baumann M, Benigni A, Bennett SE, Bischoff R, Bongcam-Rudloff E, Capasso G, Coon JJ, D’Haese P, Dominiczak AF, Dakna M, Dihazi H, Ehrich JH, Fernandez-Llama P, Fliser D, Frokiaer J, Garin J, Girolami M, Hancock WS, Haubitz M, Hochstrasser D, Holman RR, Ioannidis JP, Jankowski J, Julian BA, Klein JB, Kolch W, et al.: Recommendations for biomarker identification and qualification in clinical proteomics. Sci Trans Med 2010, 2:42.
- [6]Castaldi PJ, Dahabreh IJ, Ioannidis JP: An empirical assessment of validation practices for molecular classifiers. Brief Bioinformatics 2011, 12:189-202.
- [7]McShane LM, Cavenagh MM, Lively TG, Eberhard DA, Bigbee WL, Williams PM, Mesirov JP, Polley M-YC, Kim KY, Tricoli JV, Taylor JMG, Shuman DJ, Simon RM, Doroshow JH, Conley BA: Criteria for the use of omics-based predictors in clinical trials. Nature 2013, 502:317-320.
- [8]Daumer M, Held U, Ickstadt K, Heinz M, Schach S, Ebers G: Reducing the probability of false positive research findings by pre-publication validation – experience with a large multiple sclerosis database. BMC Med Res Methodol 2008, 8:18. BioMed Central Full Text
- [9]Boulesteix A-L, Strobl C: Optimal classifier selection and negative bias in error rate estimation: an empirical study on high-dimensional prediction. BMC Med Res Methodol 2009, 9:85. BioMed Central Full Text
- [10]Pencina MJ, D’Agostino Sr RB, D’Agostino Jr RB, Vasan RS: Evaluating the added predictive ability of a new marker: from area under the roc curve to reclassification and beyond. Stat Med 2008, 27:157-172.
- [11]Boulesteix AL, Sauerbrei W: Added predictive value of high-throughput molecular data to clinical data and its validation. Brief Bioinformatics 2011, 12:215-229.
- [12]Boulesteix A-L: On representative and illustrative comparisons with real data in bioinformatics: response to the letter to the editor by Smith et al. Bioinformatics 2013, 29:2664-2666.
- [13]Metzeler KH, Hummel M, Bloomfield CD, Spiekermann K, Braess J, Sauerland M-C, Heinecke A, Radmacher M, Marcucci G, Whitman SP, Maharry K, Paschka P, Larson RA, Berdel WE, Büchner T, Wörmann B, Mansmann U, Hiddemann W, Bohlander SK, Buske C: An 86-probe-set gene-expression signature predicts survival in cytogenetically normal acute myeloid leukemia. Blood 2008, 112:4193-4201.
- [14]Abruzzo LV, Lee KY, Fuller A, Silverman A, Keating MJ, Medeiros LJ, Coombes KR: Validation of oligonucleotide microarray data using microfluidic low-density arrays: a new statistical method to normalize real-time RT-PCR data. Biotechniques 2005, 38:785-792.
- [15]Altman DG, McShane LM, Sauerbrei W, Taube SE: Reporting recommendations for tumor marker prognostic studies (remark): explanation and elaboration. BMC Med 2012, 10:51. BioMed Central Full Text
- [16]Bair E, Tibshirani R: Semi-supervised methods to predict patient survival from gene expression data. PLoS Biol 2004, 2:108.
- [17]McShane LM, Cavenagh MM, Lively TG, Eberhard DA, Bigbee WL, Williams PM, Mesirov JP, Polley M-YC, Kim KY, Tricoli JV, Taylor JMG, Shuman DJ, Simon RM, Doroshow JH, Conley BA: Criteria for the use of omics-based predictors in clinical trials: explanation and elaboration. BMC Med 2013, 11:220. BioMed Central Full Text
- [18]Döhner H, Stilgenbauer S, Benner A, Leupolt E, Kröber A, Bullinger L, Döhner K, Bentz M, Lichter P: Genomic aberrations and survival in chronic lymphocytic leukemia. N Engl J Med 2000, 343:1910-1916.
- [19]Herold T, Jurinovic V, Metzeler K, Boulesteix A-L, Bergmann M, Seiler T, Mulaw M, Thoene S, Dufour A, Pasalic Z, Schmidberger M, Schmidt M, Schneider S, Kakadia PM, Feuring-Buske M, Braess J, Spiekermann K, Mansmann U, Hiddemann W, Buske C, Bohlander SK: An eight-gene expression signature for the prediction of survival and time to treatment in chronic lymphocytic leukemia. Leukemia 2011, 25:1639-1645.
- [20]Sauerbrei W, Boulesteix A-L, Binder H: Stability investigations of multivariable regression models derived from low-and high-dimensional data. J Biopharm Stat 2011, 21:1206-1231.
- [21]Hallek M, Cheson BD, Catovsky D, Caligaris-Cappio F, Dighiero G, Döhner H, Hillmen P, Keating MJ, Montserrat E, Rai KR, Kipp TJ: Guidelines for the diagnosis and treatment of chronic lymphocytic leukemia: a report from the international workshop on chronic lymphocytic leukemia updating the national cancer institute–working group 1996 guidelines. Blood 2008, 111:5446-5456.
- [22]Pepe MS, Kerr KF, Longton G, Wang Z: Testing for improvement in prediction model performance. Stat Med 2013, 32:1467-1482.
- [23]Royston P, Altman DG: External validation of a Cox prognostic model: principles and methods. BMC Med Res Methodol 2013, 13:33. BioMed Central Full Text
- [24]Harrell F, Lee KL, Mark DB: Tutorial in biostatistics multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat Med 1996, 15:361-387.
- [25]Gerds TA, Kattan MW, Schumacher M, Yu C: Estimating a time-dependent concordance index for survival prediction models with covariate dependent censoring. Stat Med 2013, 32:2173-2184.
- [26]Gönen M, Heller G: Concordance probability and discriminatory power in proportional hazards regression. Biometrika 2005, 92:965-970.
- [27]Binder H, Schumacher M: Allowing for mandatory covariates in boosting estimation of sparse high-dimensional survival models. BMC Bioinformatics 2008, 9:14. BioMed Central Full Text
- [28]Steyerberg EW, Vickers AJ, Cook NR, Gerds T, Gonen M, Obuchowski N, Pencina MJ, Kattan MW: Assessing the performance of prediction models: a framework for some traditional and novel measures. Epidemiology 2010, 21:128.
- [29]Graf E, Schmoor C, Sauerbrei W, Schumacher M: Assessment and comparison of prognostic classification schemes for survival data. Stat Med 1999, 18:2529-2545.
- [30]Royston P, Sauerbrei W: A new measure of prognostic separation in survival data. Stat Med 2004, 23:723-748.
- [31]Zheng Y, Cai T, Pepe MS, Levy WC: Time-dependent predictive values of prognostic biomarkers with failure time outcome. J Am Stat Assoc 2008, 103:362-368.
- [32]Pencina MJ, D’Agostino RB, Steyerberg EW: Extensions of net reclassification improvement calculations to measure usefulness of new biomarkers. Stat Med 2011, 30:11-21.
- [33]Zheng Y, Parast L, Cai T, Brown M: Evaluating incremental values from new predictors with net reclassification improvement in survival analysis. Lifetime Data Anal 2013, 19:350-370.
- [34]Vickers AJ, Elkin EB: Decision curve analysis: a novel method for evaluating prediction models. Med Decis Making 2006, 26:565-574.
- [35]Vickers AJ, Cronin AM, Elkin EB, Gonen M: Extensions to decision curve analysis, a novel method for evaluating diagnostic tests, prediction models and molecular markers. BMC Med Inform Decis Making 2008, 8:53. BioMed Central Full Text
- [36]Hielscher T, Zucknick M, Werft W, Benner A: On the prognostic value of survival models with application to gene expression signatures. Stat Med 2010, 29:818-829.
- [37]Crowson CS, Atkinson EJ, Therneau TM: Assessing calibration of prognostic risk scores. Stat Methods Med Res 2013. doi:10.1177/0962280213497434
- [38]Harrell FE: Regression Modeling Strategies: with Applications to Linear Models, Logistic Regression, and Survival Analysis. New York: Springer; 2001.
- [39]Copas JB: Regression, prediction and shrinkage. J R Stat Soc Ser B (Methodological) 1983, 45:311-354.
- [40]Van Houwelingen J, Le Cessie S: Predictive value of statistical models. Stat Med 1990, 9:1303-1325.
- [41]van Houwelingen HC: Validation, calibration, revision and combination of prognostic survival models. Stat Med 2000, 19:3401-3415.
- [42]Martinez JG, Carroll RJ, Müller S, Sampson JN, Chatterjee N: Empirical performance of cross-validation with oracle methods in a genomics context. Am Stat 2011, 65:223-228.
- [43]Boulesteix A-L, Richter A, Bernau C: Complexity selection with cross-validation for lasso and sparse partial least squares using high-dimensional data. In Algorithms from and for Nature and Life. Switzerland: Springer; 2013:261-268.
- [44]Molinaro AM, Simon R, Pfeiffer RM: Prediction error estimation: a comparison of resampling methods. Bioinformatics 2005, 21:3301-3307.
- [45]Dougherty ER, Sima C, Hanczar B, Braga-Neto UM: Performance of error estimators for classification. Curr Bioinformatics 2010, 5:53-67.
- [46]Bøvelstad HM, Nygård S, Størvold HL, Aldrin M, Frigessi A, Lingjærde OC, Borgan Ø: Predicting survival from microarray data - a comparative study. Bioinformatics 2007, 23:2080-2087.
- [47]Bøvelstad HM, Nygård S, Borgan Ø: Survival prediction from clinico-genomic models - a comparative study. BMC Bioinformatics 2009, 10:413. BioMed Central Full Text
- [48]Daye ZJ, Jeng XJ: Shrinkage and model selection with correlated variables via weighted fusion. Comput Stat Data Anal 2009, 53:1284-1298.
- [49]Boulesteix A-L, Strobl C, Augustin T, Daumer M: Evaluating microarray-based classifiers: an overview. Cancer Inform 2008, 6:77.
- [50]Efron B, Tibshirani R: Improvements on cross-validation: the 632+ bootstrap method. J Am Stat Assoc 1997, 92:548-560.
- [51]Van De Wiel MA, Berkhof J, Van Wieringen WN: Testing the prediction error difference between 2 predictors. Biostatistics 2009, 10:550-560.
- [52]Boulesteix A-L, Hothorn T: Testing the additional predictive value of high-dimensional molecular data. BMC Bioinformatics 2010, 11:78. BioMed Central Full Text
- [53]Nevins JR, Huang ES, Dressman H, Pittman J, Huang AT, West M: Towards integrated clinico-genomic models for personalized medicine: combining gene expression signatures and clinical factors in breast cancer outcomes prediction. Hum Mol Genet 2003, 12:153-157.
- [54]Stephenson AJ, Smith A, Kattan MW, Satagopan J, Reuter VE, Scardino PT, Gerald WL: Integration of gene expression profiling and clinical variables to predict prostate carcinoma recurrence after radical prostatectomy. Cancer 2005, 104:290-298.
- [55]McIntosh M, Anderson G, Drescher C, Hanash S, Urban N, Brown P, Gambhir SS, Coukos G, Laird PW, Nelson B, Palmer C: Ovarian cancer early detection claims are biased. Clin Cancer Res 2008, 14:7574.
- [56]Altman D, Royston P: What do we mean by validating a prognostic model? Stat Med 2000, 19:453-473.
- [57]Pepe MS, Janes H, Longton G, Leisenring W, Newcomb P: Limitations of the odds ratio in gauging the performance of a diagnostic, prognostic, or screening marker. Am J Epidemiol 2004, 159:882-890.