期刊论文详细信息
Diagnostic and Prognostic Research
The development and validation of prognostic models for overall survival in the presence of missing data in the training dataset: a strategy with a detailed example
David A. Cairns1  Kara-Louise Royle1 
[1] Clinical Trials Research Unit, Leeds Institute of Clinical Trials Research, University of Leeds, Leeds, UK;
关键词: Prognostic Model;    Multiple imputation;    Missing data;    Overall survival;   
DOI  :  10.1186/s41512-021-00103-9
来源: Springer
PDF
【 摘 要 】

BackgroundThe United Kingdom Myeloma Research Alliance (UK-MRA) Myeloma Risk Profile is a prognostic model for overall survival. It was trained and tested on clinical trial data, aiming to improve the stratification of transplant ineligible (TNE) patients with newly diagnosed multiple myeloma. Missing data is a common problem which affects the development and validation of prognostic models, where decisions on how to address missingness have implications on the choice of methodology.MethodsModel buildingThe training and test datasets were the TNE pathways from two large randomised multicentre, phase III clinical trials. Potential prognostic factors were identified by expert opinion. Missing data in the training dataset was imputed using multiple imputation by chained equations. Univariate analysis fitted Cox proportional hazards models in each imputed dataset with the estimates combined by Rubin’s rules. Multivariable analysis applied penalised Cox regression models, with a fixed penalty term across the imputed datasets. The estimates from each imputed dataset and bootstrap standard errors were combined by Rubin’s rules to define the prognostic model.Model assessmentCalibration was assessed by visualising the observed and predicted probabilities across the imputed datasets. Discrimination was assessed by combining the prognostic separation D-statistic from each imputed dataset by Rubin’s rules.Model validationThe D-statistic was applied in a bootstrap internal validation process in the training dataset and an external validation process in the test dataset, where acceptable performance was pre-specified.Development of risk groupsRisk groups were defined using the tertiles of the combined prognostic index, obtained by combining the prognostic index from each imputed dataset by Rubin’s rules.ResultsThe training dataset included 1852 patients, 1268 (68.47%) with complete case data. Ten imputed datasets were generated. Five hundred twenty patients were included in the test dataset. The D-statistic for the prognostic model was 0.840 (95% CI 0.716–0.964) in the training dataset and 0.654 (95% CI 0.497–0.811) in the test dataset and the corrected D-Statistic was 0.801.ConclusionThe decision to impute missing covariate data in the training dataset influenced the methods implemented to train and test the model. To extend current literature and aid future researchers, we have presented a detailed example of one approach. Whilst our example is not without limitations, a benefit is that all of the patient information available in the training dataset was utilised to develop the model.Trial registrationBoth trials were registered; Myeloma IX-ISRCTN68454111, registered 21 September 2000. Myeloma XI-ISRCTN49407852, registered 24 June 2009.

【 授权许可】

CC BY   

【 预 览 】
附件列表
Files Size Format View
RO202109178862096ZK.pdf 1302KB PDF download
  文献评价指标  
  下载次数:3次 浏览次数:4次