期刊论文详细信息
BMC Medical Research Methodology
Assessment of predictive performance in incomplete data by combining internal validation and multiple imputation
Mark A. van de Wiel1  Anne-Laure Boulesteix2  Barbara Thorand3  Astrid Zierer3  Simone Wahl4 
[1] Department of Epidemiology and Biostatistics, VU University Medical Center;Department of Medical Informatics, Biometry and Epidemiology, Ludwig-Maximilians-Universität München;Institute of Epidemiology II, Helmholtz Zentrum München - German Research Center for Environmental Health;Research Unit of Molecular Epidemiology, Helmholtz Zentrum München - German Research Center for Environmental Health;
关键词: Missing values;    Incomplete data;    Prediction model;    Predictive performance;    Bootstrap;    Internal validation;   
DOI  :  10.1186/s12874-016-0239-7
来源: DOAJ
【 摘 要 】

Abstract Background Missing values are a frequent issue in human studies. In many situations, multiple imputation (MI) is an appropriate missing data handling strategy, whereby missing values are imputed multiple times, the analysis is performed in every imputed data set, and the obtained estimates are pooled. If the aim is to estimate (added) predictive performance measures, such as (change in) the area under the receiver-operating characteristic curve (AUC), internal validation strategies become desirable in order to correct for optimism. It is not fully understood how internal validation should be combined with multiple imputation. Methods In a comprehensive simulation study and in a real data set based on blood markers as predictors for mortality, we compare three combination strategies: Val-MI, internal validation followed by MI on the training and test parts separately, MI-Val, MI on the full data set followed by internal validation, and MI(-y)-Val, MI on the full data set omitting the outcome followed by internal validation. Different validation strategies, including bootstrap und cross-validation, different (added) performance measures, and various data characteristics are considered, and the strategies are evaluated with regard to bias and mean squared error of the obtained performance estimates. In addition, we elaborate on the number of resamples and imputations to be used, and adopt a strategy for confidence interval construction to incomplete data. Results Internal validation is essential in order to avoid optimism, with the bootstrap 0.632+ estimate representing a reliable method to correct for optimism. While estimates obtained by MI-Val are optimistically biased, those obtained by MI(-y)-Val tend to be pessimistic in the presence of a true underlying effect. Val-MI provides largely unbiased estimates, with a slight pessimistic bias with increasing true effect size, number of covariates and decreasing sample size. In Val-MI, accuracy of the estimate is more strongly improved by increasing the number of bootstrap draws rather than the number of imputations. With a simple integrated approach, valid confidence intervals for performance estimates can be obtained. Conclusions When prognostic models are developed on incomplete data, Val-MI represents a valid strategy to obtain estimates of predictive performance measures.

【 授权许可】

Unknown   

  文献评价指标  
  下载次数:0次 浏览次数:4次