期刊论文详细信息
BMC Medical Research Methodology
Auxiliary variables in multiple imputation in regression with missing X: a warning against including too many in small sample research
Rainer Leonhart2  Max Herke1  Jochen Hardt1 
[1] Medical Psychology and Medical Sociology, Clinic for Psychosomatic Medicine and Psychotherapy, University of Mainz, Duesbergweg 6, Mainz 55128, Germany;Social Psychology and Methods, University of Freiburg, Engelberger Straße 41, Freiburg, 79106, Germany
关键词: Small and medium size samples;    Simulation study;    Auxiliary variables;    Multiple imputation;   
Others  :  1126352
DOI  :  10.1186/1471-2288-12-184
 received in 2012-07-05, accepted in 2012-11-28,  发布年份 2012
PDF
【 摘 要 】

Background

Multiple imputation is becoming increasingly popular. Theoretical considerations as well as simulation studies have shown that the inclusion of auxiliary variables is generally of benefit.

Methods

A simulation study of a linear regression with a response Y and two predictors X1 and X2 was performed on data with n = 50, 100 and 200 using complete cases or multiple imputation with 0, 10, 20, 40 and 80 auxiliary variables. Mechanisms of missingness were either 100% MCAR or 50% MAR + 50% MCAR. Auxiliary variables had low (r=.10) vs. moderate correlations (r=.50) with X’s and Y.

Results

The inclusion of auxiliary variables can improve a multiple imputation model. However, inclusion of too many variables leads to downward bias of regression coefficients and decreases precision. When the correlations are low, inclusion of auxiliary variables is not useful.

Conclusion

More research on auxiliary variables in multiple imputation should be performed. A preliminary rule of thumb could be that the ratio of variables to cases with complete data should not go below 1 : 3.

【 授权许可】

   
2012 Hardt et al.; licensee BioMed Central Ltd.

【 预 览 】
附件列表
Files Size Format View
20150218125104465.pdf 975KB PDF download
Figure 2. 97KB Image download
Figure 1. 108KB Image download
【 图 表 】

Figure 1.

Figure 2.

【 参考文献 】
  • [1]Little RJ, Rubin DB: Statistical analysis with missing data. New York: Wiley; 2002.
  • [2]Rubin DB: Multiple imputations after 18 plus years. JASA 1996, 91:473-489.
  • [3]Mackinnon A: The use and reporting of multiple imputation in medical research - a review. J Intern Med 2010, 268:586-593.
  • [4]Karahalios A, Baglietto L, Carlin JB, English DR, Simpson JA: A review of the reporting and handling of missing data in cohort studies with repeated assessment of exposure measures. BMC Med Res Methodol 2012, 12:96. BioMed Central Full Text
  • [5]Rubin DB: Multiple imputation for nonresponse in surveys. New York: Wiley & Sons; 1987.
  • [6]Little RJ: Regression with missing X's: a review. J Am Stat Assoc 1992, 87:1227-1237.
  • [7]White IR, Carlin JB: Bias and efficiency of multiple imputation compared with complete-case analysis for missing covariate values. Stat Med 2010, 29:2920-2931.
  • [8]Ambler G, Omar RZ, Royston P: A comparison of imputation techniques for handling missing predictor values in a risk model with a binary outcome. Stat Methods Med Res 2007, 16:277-298.
  • [9]Eisemann N, Waldmann A, Katalinic A: Imputation of missing values of tumour stage in population-based cancer registration. BMC Med Res Methodol 2011, 11:129-142. BioMed Central Full Text
  • [10]Marti H, Carcaillon L, Chavance M: Multiple imputation for estimating hazard ratios and predictive abilities in case-cohort surveys. BMC Med Res Methodol 2012, 12:24. BioMed Central Full Text
  • [11]Soullier N, de La Rochebrochard E, Bouyer J: Multiple imputation for estimation of an occurrence rate in cohorts with attrition and discrete follow-up time points: a simulation study. BMC Med Res Methodol 2010, 10:79-86. BioMed Central Full Text
  • [12]Schenker N, Borrud LG, Burt VL, Curtin LR, Flegal KM, Hughes J, Johnson CL, Looker AC, Mirel L: Multiple imputation of missing dual-energy X-ray absorptiometry data in the national health and nutrition examination survey. Stat Med 2011, 30:260-276.
  • [13]Collins LM, Schafer JL, Kam C-M: A comparison of inclusive and restrictive strategies in modern missing data procedures. Psychol Methods 2001, 6:330-351.
  • [14]Schafer JL, Graham JW: Missing data: our view of the state of the art. Psychol Methods 2002, 7:147-177.
  • [15]Enders CE: Applied missing data analysis. New York: Guilford; 2010.
  • [16]Hoo JE: The effect of auxiliary variables and multiple imputation on parameter estimation in confirmatory factor analysis. Educ Psychol Meas 2009, 69:929-947.
  • [17]White IR, Royston P, Wood AM: Multiple imputation using chained equations: Issues and guidance for practice. Stat Med 2011, 30:377-399.
  • [18]Axen I, Bodin L, Kongsted A, Wedderkopp N, Jensen I, Bergstrom G: Analyzing repeated data collected by mobile phones and frequent text messages. An example of low back pain measured weekly for 18 weeks. BMC Med Res Methodol 2012, 12:105. BioMed Central Full Text
  • [19]Cohen J: Statistical power analysis for behavioural sciences. Hillsdale, NY: Lawrence Erlbaum Associates; 1988.
  • [20]Allison PD: Multiple imputation for missing data: a cautionary tale. Sociol Methods Res 2000, 28:301-309.
  • [21]Horton NJ, Lipsitz JR: Multiple imputation in practice: Comparison of software pachages for regression models with missing variables. Am Stat 2001, 55:244-254.
  • [22]Graham JW, Olchowski AE, Gilreath TD: How many imputations are really needed? Some practical clarifications of multiple imputation theory. Prev Sci 2007, 8:206-213.
  • [23]StataCorp: Stata Statistical Software. Release 12. College Station, TX: StataCorp; 2011.
  • [24]van Buuren S, Boshuizen HC, Knook DL: Multiple imputation of missing blood pressure covariates in survival analysis. Stat Med 1999, 18:681-694.
  • [25]Groothuis-Oudshoorn K, van Buuren S: Mice: multivariate imputation by chained equations in R. J Stat Software 2011., 45http://www.jstatsoft.org/v2045/i12003 webcite
  • [26]Marshall A, Altman DG, Holder RL: Comparison of imputation methods for handling missing covariate data when fitting a Cox proportional hazards model: a resampling study. BMC Med Res Methodol 2010, 10:112. BioMed Central Full Text
  • [27]Marshall A, Altman DG, Royston P, Holder RL: Comparison of techniques for handling missing covariate data within prognostic modelling studies: a simulation study. BMC Med Res Methodol 2010, 10:7. BioMed Central Full Text
  • [28]Lee KJD, Carlin JBP: Recovery of information from multiple imputation: a simulation study. Emerg Themes Epidemiol 2012, 9:3. http://www.ete-online.com/content/pdf/1742-7622-1749-1743.pdf webcite BioMed Central Full Text
  • [29]R Development Core Team: R: a language and environment for statistical computing. In Book R: a language and environment for statistical computing. City: R Foundation for Statistical Computing; 2011.
  • [30]Becker RA: The new S language. Cole: Wadsworth & Brooks; 1988.
  • [31]Eddelbuettel D: Random: an R package for true random numbers. 2006. http://cranr-projectorg/web/packages/random/vignettes/random-intropdf webcite
  • [32]Schafer JL: Analysis of incomplete multivariate data. New York: CRC Press; 1997.
  • [33]Honaker J, King G: What to do about missing values in time serious cross section data. American Journal of Political Science 2010, 2:561-581.
  • [34]Taylor LM, Zhou XH: Multiple imputation methods for treatment noncompliance and nonresponse in randomized clinical trials. Biometrics 2009, 65:88-95.
  • [35]ice: a program for multiple imputationhttp://www.ats.ucla.edu/stat/stata/library/ice.html webcite
  • [36]SPSS Inc: SPSS V20. Chicago, IL; 2012.
  • [37]Hardt J: The symptom-check-list-27-plus (SCL-27-plus): a modern conceptualization of a traditional screening instrument. German Medical Science - Psychosoc Med 2008., 5http://www.egms.de/en/journals/psm/2008-2005/psm000053.shtml webcite
  • [38]Hardt J, Stark H: Der Stark QoL- ein etwas anderer Fragebogen zur Lebensqualität. Poster zur 60. Arbeitstagungstagung der DKPM und 17. Jahrestagung der DGPM, Mainz, 18.-21. März. Psychol Med 2009., 20
  • [39]Hardt J, Dragan M, Kappis B: A short screening instrument for mental health problems: The Symptom Checklist-27 (SCL-27) in Poland and Germany. Int J Psychiatry Clin Pract 2011, 15:42-49.
  • [40]Enders CK, Peugh JL: Using an EM covariance matrix to estimate structural equation models with missing data: choosing an adjusted sample size to improve the accuracy of inferences. Structural Equation Modeling 2004, 11:1-19.
  • [41]Ranstam J, Turkiewicz A, Boonen S, Van Meirhaeghe J, Bastian L, Wardlaw D: Alternative analyses for handling incomplete follow-up in the intention-to-treat analysis: the randomized controlled trial of balloon kyphoplasty versus non-surgical care for vertebral compression fracture (FREE). BMC Med Res Methodol 2012, 12:35-47. BioMed Central Full Text
  • [42]van Buuren S: Flexible imputation of missing data. Boca Raton: CRC Press (Chapman & Hall); 2012.
  • [43]Peduzzi P, Concato J, Kemper E, Holford TR, Feinstein AR: A simulation study of the number of events per variable in logistic regression analsis. J Clin Epidemiol 1996, 49:1373-1379.
  • [44]Courvoisier DS, Combescure C, Agoritsas T, Gayet-Ageron A, Perneger TV: Performance of logistic regression modeling: beyond the number of events per variable, the role of data structure. J Clin Epidemiol 2011, 64:993-1000.
  • [45]Yucel RM, Demirtas H: Impact of non-normal random effects on inference by multiple imputation: a simulation assessment. Comput Stat Data An 2010, 54:790-801.
  • [46]Seaman SR, Bartlett JW, White IR: Multiple imputation of missing covariates with non-linear effects and interactions: an evaluation of statistical methods. BMC Med Res Methodol 2012, 12:46. BioMed Central Full Text
  • [47]Knol MJ, Janssen KJ, Donders AR, Egberts AC, Heerdink ER, Grobbee DE, Moons KG, Geerlings MI: Unpredictable bias when using the missing indicator method or complete case analysis for missing confounder values: an empirical example. J Clin Epidemiol 2010, 63:728-736.
  文献评价指标  
  下载次数:65次 浏览次数:22次