期刊论文详细信息
BMC Medical Research Methodology
Diagnosing problems with imputation models using the Kolmogorov-Smirnov test: a simulation study
Katherine J Lee2  John B Carlin1  Cattram D Nguyen2 
[1] Melbourne School of Population and Global Health, Faculty of Medicine, Dentistry and Health Sciences, University of Melbourne, Melbourne, Australia;Department of Paediatrics, Faculty of Medicine, Dentistry and Health Sciences, University of Melbourne, Melbourne, Australia
关键词: Simulations;    Diagnostics;    Kolmogorov-Smirnov test;    Model checking;    Multiple imputation;    Missing data;   
Others  :  866564
DOI  :  10.1186/1471-2288-13-144
 received in 2013-08-09, accepted in 2013-11-12,  发布年份 2013
PDF
【 摘 要 】

Background

Multiple imputation (MI) is becoming increasingly popular as a strategy for handling missing data, but there is a scarcity of tools for checking the adequacy of imputation models. The Kolmogorov-Smirnov (KS) test has been identified as a potential diagnostic method for assessing whether the distribution of imputed data deviates substantially from that of the observed data. The aim of this study was to evaluate the performance of the KS test as an imputation diagnostic.

Methods

Using simulation, we examined whether the KS test could reliably identify departures from assumptions made in the imputation model. To do this we examined how the p-values from the KS test behaved when skewed and heavy-tailed data were imputed using a normal imputation model. We varied the amount of missing data, the missing data models and the amount of skewness, and evaluated the performance of KS test in diagnosing issues with the imputation models under these different scenarios.

Results

The KS test was able to flag differences between the observations and imputed values; however, these differences did not always correspond to problems with MI inference for the regression parameter of interest. When there was a strong missing at random dependency, the KS p-values were very small, regardless of whether or not the MI estimates were biased; so that the KS test was not able to discriminate between imputed variables that required further investigation, and those that did not. The p-values were also sensitive to sample size and the proportion of missing data, adding to the challenge of interpreting the results from the KS test.

Conclusions

Given our study results, it is difficult to establish guidelines or recommendations for using the KS test as a diagnostic tool for MI. The investigation of other imputation diagnostics and their incorporation into statistical software are important areas for future research.

【 授权许可】

   
2013 Nguyen et al.; licensee BioMed Central Ltd.

【 预 览 】
附件列表
Files Size Format View
20140727075022403.pdf 435KB PDF download
57KB Image download
72KB Image download
【 图 表 】

【 参考文献 】
  • [1]Sterne JAC, White IR, Carlin JB, Spratt M, Royston P, Kenward MG, Wood AM, Carpenter JR: Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls. BMJ Br Med J 2009, 338:b2393.
  • [2]Mackinnon A: The use and reporting of multiple imputation in medical research – a review. J Intern Med 2010, 268(6):586-593.
  • [3]Little RJA, Rubin DB: Statistical analysis with missing data. 2nd edition. Hoboken, N.J.: Wiley; 2002.
  • [4]Gelman A, Van Mechelen I, Verbeke G, Heitjan DF, Meulders M: Multiple imputation for model checking: Completed-data plots with missing and latent data. Biometrics 2005, 61(1):74-85.
  • [5]He Y, Zaslavsky AM: Diagnosing imputation models by applying target analyses to posterior replicates of completed data. Stat Med 2012, 31(1):1-18.
  • [6]Gelman A, King G, Liu CH: Not asked and not answered: Multiple imputation for multiple surveys. J Am Stat Assoc 1998, 93(443):846-857.
  • [7]Abayomi K, Gelman A, Levy M: Diagnostics for multivariate imputations. J Royal Stat Soc Series C-Appl Stat 2008, 57:273-291.
  • [8]White IR, Royston P, Wood AM: Multiple imputation using chained equations: Issues and guidance for practice. Stat Med 2011, 30(4):377-399.
  • [9]Conover WJ: Practical nonparametric statistics. 2d edition. New York: Wiley; 1980.
  • [10]Eddings W, Marchenko Y: Diagnostics for multiple imputation in Stata. Stata J 2012, 12:3.
  • [11]Australian Institute of Family Studies: Longitudinal Study of Australian Children Data User Guide – April 2010. Melbourne: Australian Institute of Family Studies; 2010.
  • [12]Goodman R: The strengths and difficulties questionnaire: a research note. J Child Psychol Psychiatry 1997, 38(5):581-586.
  • [13]Bayer JK, Ukoumunne OC, Lucas N, Wake M, Scalzo K, Nicholson JM: Risk factors for childhood mental health symptoms: national Longitudinal Study Of Australian Children. Pediatrics 2011, 128(4):865-879.
  • [14]Horton NJ, Kleinman KP: Much Ado about nothing. Am Stat 2007, 61(1):79-90.
  • [15]Collins LM, Schafer JL, Kam CM: A comparison of inclusive and restrictive strategies in modern missing data procedures. Psychol meth 2001, 6(4):330-351.
  • [16]Lee KJ, Carlin JB: Multiple imputation for missing data: fully conditional specification versus multivariate normal imputation. Am J Epidemiol 2010, 171(5):624-632.
  • [17]Moons KGM, Donders RART, Stijnen T, Harrell JFE: Using the outcome for imputation of missing predictor values was preferred. J Clin Epidemiol 2006, 59(10):1092-1101.
  • [18]StataCorp: Stata Statistical Software: Release 12. College Station, TX: StataCorp LP; 2011.
  • [19]Bernaards CA, Belin TR, Schafer JL: Robustness of a multivariate normal approximation for imputation of incomplete binary data. Stat Med 2007, 26(6):1368-1382.
  • [20]Azzalini A, Capitanio A: Distributions generated by perturbation of symmetry with emphasis on a multivariate skew t-distribution. J Roy Stat Soc B 2003, 65:367-389.
  • [21]Burton A, Altman DG, Royston P, Holder RL: The design of simulation studies in medical statistics. Stat Med 2006, 25(24):4279-4292.
  • [22]Stuart EA, Azur M, Frangakis C, Leaf P: Multiple imputation with large data sets: a case study of the Children’s mental health initiative. Am J Epidemiol 2009, 169(9):1133-1139.
  • [23]van Buuren S: Multiple imputation of discrete and continuous data by fully conditional specification. Stat Methods Med Res 2007, 16(3):219-242.
  • [24]Raghunathan TE, Lepkowski JM, Van Hoewyk J, Solenberger P: A multivariate technique for multiply imputing missing values using a sequence of regression models. Survey Methodol 2001, 27:85-96.
  • [25]Su YS, Gelman A, Hill J, Yajima M: Multiple Imputation with Diagnostics (mi) in R: Opening Windows into the Black Box. J Stat Softw 2011, 45(2):1-31.
  文献评价指标  
  下载次数:28次 浏览次数:30次