期刊论文详细信息
BMC Medical Research Methodology
Tuning multiple imputation by predictive mean matching and local residual draws
Patrick Royston1  Ian R White2  Tim P Morris1 
[1] Hub for Trials Methodology Research, MRC Clinical Trials Unit at UCL, Aviation House, 125 Kingsway, WC2B 6NH, London, UK;MRC Biostatistics Unit, Cambridge Institute of Public Health, Forvie Site, Robinson Way, Cambridge Biomedical Campus, CB2 0SR, Cambridge, UK
关键词: Missing data;    Local residual draws;    Predictive mean matching;    Imputation model;    Multiple imputation;   
Others  :  865392
DOI  :  10.1186/1471-2288-14-75
 received in 2014-03-04, accepted in 2014-05-09,  发布年份 2014
PDF
【 摘 要 】

Background

Multiple imputation is a commonly used method for handling incomplete covariates as it can provide valid inference when data are missing at random. This depends on being able to correctly specify the parametric model used to impute missing values, which may be difficult in many realistic settings. Imputation by predictive mean matching (PMM) borrows an observed value from a donor with a similar predictive mean; imputation by local residual draws (LRD) instead borrows the donor’s residual. Both methods relax some assumptions of parametric imputation, promising greater robustness when the imputation model is misspecified.

Methods

We review development of PMM and LRD and outline the various forms available, and aim to clarify some choices about how and when they should be used. We compare performance to fully parametric imputation in simulation studies, first when the imputation model is correctly specified and then when it is misspecified.

Results

In using PMM or LRD we strongly caution against using a single donor, the default value in some implementations, and instead advocate sampling from a pool of around 10 donors. We also clarify which matching metric is best. Among the current MI software there are several poor implementations.

Conclusions

PMM and LRD may have a role for imputing covariates (i) which are not strongly associated with outcome, and (ii) when the imputation model is thought to be slightly but not grossly misspecified. Researchers should spend efforts on specifying the imputation model correctly, rather than expecting predictive mean matching or local residual draws to do the work.

【 授权许可】

   
2014 Morris et al.; licensee BioMed Central Ltd.

【 预 览 】
附件列表
Files Size Format View
20140726062756947.pdf 1216KB PDF download
65KB Image download
66KB Image download
63KB Image download
70KB Image download
70KB Image download
70KB Image download
78KB Image download
98KB Image download
【 图 表 】

【 参考文献 】
  • [1]Harel O, Zhou XH: Multiple imputation: review of theory, implementation and software. Stat Med 2007, 26:3057-3077.
  • [2]Horton NJ, Kleinman KP: Much ado about nothing: A comparison of missing data methods and software to fit incomplete data regression models. Am Stat 2007, 61:79-90.
  • [3]White IR, Royston P, Wood AM: Multiple imputation using chained equations: Issues and guidance for practice. Stat Med 2011, 30(4):377-399.
  • [4]Rubin DB: Inference and missing data. Biometrika 1976, 63:581-592.
  • [5]Rubin DB: Multiple Imputation for Nonresponse in Surveys. New York: John Wiley and Sons; 1987.
  • [6]Schafer JL: Multiple imputation: a primer. Stat Methods Med Res 1999, 8(1):3-15.
  • [7]Moons K, Donders R, Stijnen T, Harrel F: Using the outcome for imputation of missing predictor values was preferred. J Clin Epidemiol 2006, 59(10):1092-1101.
  • [8]Seaman SR, Bartlett JW, White IR: Multiple imputation of missing covariates with non-linear effects and interactions: an evaluation of statistical methods. BMC Med Res Methodol 2012, 12(1):46+. BioMed Central Full Text
  • [9]Little RJA: Missing-data adjustments in large surveys. J Business & Econ Stat 1988, 6:287-296.
  • [10]David M, Little RJA, Samuhel ME, Triest RK: Alternative methods for CPS income imputation. J Am Stat Assoc 1986, 81(393):29-41.
  • [11]Rubin DB, Schenker N: Multiple imputation for interval estimation from simple random samples with ignorable nonresponse. J Am Stat Assoc 1986, 81:366-374.
  • [12]van Buuren S, Groothuis-Oudshoorn K: Mice: Multivariate Imputation by Chained Equations. Netherlands Organisation for Applied Scientific Research TNO; February 2014.
  • [13]Meinfelder F: BaBooN: Bayesian Bootstrap Predictive Mean Matching – Multiple and single imputation for discrete data. Universität Bamberg; March 2011.
  • [14]Gelman A, Hill J, Su YS, Yajima M, Pittau MG: mi: Missing Data Imputation and Model Checking. Columbia University; August 2013.
  • [15]SAS Institute Inc: Predictive mean matching method for monotone missing data. February 2014. http://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#statug_mi_sect020.htm webcite
  • [16]Solas for Missing Data Analysis: Predictive mean matching method. February 2014. http://www.statsols.com/predictive-mean-matching-method/ webcite
  • [17]SPSS: Predictive mean matching (multiple imputation algorithms). February 2014. http://pic.dhe.ibm.com/infocenter/spssstat/v20r0m0/index.jsp?topic=%2Fcom.ibm.spss.%20statistics.help%2Falg_ multiple_imputation_univariate_pmm.htm webcite
  • [18]StataCorp: mi impute pmm. February 2014. http://www.stata.com/manuals13/mimiimputepmm.pdf webcite
  • [19]Schenker N, Taylor JMG: Partially parametric techniques for multiple imputation. Comput Stat & Data Anal 1996, 22(4):425-446.
  • [20]Heitjan DF, Little RJA: Multiple imputation for the fatal accident reporting system. J R Stat Soc Series C (Appl Stat) 1991, 40(1):13-29.
  • [21]Royston P: Multiple imputation of missing values: update. Stata J 2005, 5:527-536.
  • [22]Harrell FE: Hmisc: Harrell Miscellaneous. Vanderbilt University; January 2014.
  • [23]Heitjan DF, Landis RJ: Assessing secular trends in blood pressure: a multiple-imputation approach. J Am Stat Assoc 1994, 89(427):750-759.
  • [24]Zhou XH, Eckert GJ, Tierney WM: Multiple imputation in public health research. Stat Med 2001, 20(9-10):1541-1549.
  • [25]Horton NJ, Lipsitz SR: Multiple imputation in practice: comparison of software packages for regression models with missing variables. Am Stat 2001, 55:244-254.
  • [26]Tang L, Song J, Belin TR, Unützer J: A comparison of imputation methods in a longitudinal randomized clinical trial. Stat Med 2005, 24(14):2111-2128.
  • [27]Hsu CH, Taylor JMG, Murray S, Commenges D: Survival analysis using auxiliary variables via non-parametric multiple imputation. Stat Med 2006, 25(20):3503-3517.
  • [28]Barnes SA, Lindborg SR, Seaman JW: Multiple imputation techniques in small sample clinical trials. Stat Med 2006, 25(2):233-245.
  • [29]Qi L, Wang Y-FF, He Y: A comparison of multiple imputation and fully augmented weighted estimators for Cox regression with missing covariates. Stat Med 2010, 29(25):2592-2604.
  • [30]Siddique J, Belin TR: Multiple imputation using an iterative hot-deck with distance-based donor selection. Stat Med 2008, 27(1):83-102.
  • [31]Siddique J, Harel O: MIDAS: a SAS macro for multiple imputation using distance-aided selection of donors. J Stat Softw 2009, 29(9):1-18.
  • [32]Moriarity C, Scheuren F: A note on rubin’s statistical matching using file concatenation with adjusted weights and multiple imputations. J Business & Econ Stat 2003, 21(1):65-73.
  • [33]Durrant GB, Skinner C: Using missing data methods to correct for measurement error in a distribution function. Surv Methodol 2006, 32(1):25-36.
  • [34]StataCorp: Stata Statistical Software: Release 13. College Station, TX: Stata Press; 2013.
  • [35]Bartlett JW, Seaman SR, White IR, Carpenter JR, for the Alzheimer’sDiseaseNeuroimagingInitiative*: Multiple imputation of covariates by fully conditional specification: accommodating the substantive model. Stat Methods Med Res 2014, 0962280214521348+. http://smm.sagepub.com/content/early/2014/03/31/0962280214521348 webcite
  • [36]Morris TP, White IR, Royston P, Seaman SR, Wood AM: Multiple imputation for an incomplete covariate that is a ratio. Stat Med 2014, 33(1):88-104.
  • [37]Clark TG, Altman DG: Developing a prognostic model in the presence of missing data. J Clin Epidemiol 2003, 56(1):28-37.
  • [38]Cox DR: Regression models and life tables. J R Stat Soc series B 1972, 34:187-220.
  • [39]White IR, Royston P: Imputing missing covariate values for the cox model. Stat Med 2009, 28(15):1982-1998.
  • [40]Dardanoni V, Modica S, Peracchi F: Regression with imputed covariates: A generalized missing-indicator approach. J Econom 2011, 162(2):362-368.
  • [41]Vink G, van Buuren S: Multiple imputation of squared terms. Sociol Methods & Res 2013, 42(4):598-607.
  文献评价指标  
  下载次数:82次 浏览次数:10次