期刊论文详细信息
BMC Medical Research Methodology
Comparison of methods for imputing limited-range variables: a simulation study
John B Carlin2  Helena Romaniuk1  Katherine J Lee2  Laura Rodwell2 
[1] Centre for Adolescent Health, Murdoch Childrens Research Institute, Melbourne, Australia;Department of Paediatrics, Faculty of Medicine, Dentistry and Health Sciences, The University of Melbourne, Melbourne, Australia
关键词: Truncated regression;    Rounding;    Missing data;    Skewed data;    Limited-range;    Multiple imputation;   
Others  :  866351
DOI  :  10.1186/1471-2288-14-57
 received in 2013-12-20, accepted in 2014-04-08,  发布年份 2014
PDF
【 摘 要 】

Background

Multiple imputation (MI) was developed as a method to enable valid inferences to be obtained in the presence of missing data rather than to re-create the missing values. Within the applied setting, it remains unclear how important it is that imputed values should be plausible for individual observations. One variable type for which MI may lead to implausible values is a limited-range variable, where imputed values may fall outside the observable range. The aim of this work was to compare methods for imputing limited-range variables, with a focus on those that restrict the range of the imputed values.

Methods

Using data from a study of adolescent health, we consider three variables based on responses to the General Health Questionnaire (GHQ), a tool for detecting minor psychiatric illness. These variables, based on different scoring methods for the GHQ, resulted in three continuous distributions with mild, moderate and severe positive skewness. In an otherwise complete dataset, we set 33% of the GHQ observations to missing completely at random or missing at random; repeating this process to create 1000 datasets with incomplete data for each scenario.

For each dataset, we imputed values on the raw scale and following a zero-skewness log transformation using: univariate regression with no rounding; post-imputation rounding; truncated normal regression; and predictive mean matching. We estimated the marginal mean of the GHQ and the association between the GHQ and a fully observed binary outcome, comparing the results with complete data statistics.

Results

Imputation with no rounding performed well when applied to data on the raw scale. Post-imputation rounding and imputation using truncated normal regression produced higher marginal means than the complete data estimate when data had a moderate or severe skew, and this was associated with under-coverage of the complete data estimate. Predictive mean matching also produced under-coverage of the complete data estimate. For the estimate of association, all methods produced similar estimates to the complete data.

Conclusions

For data with a limited range, multiple imputation using techniques that restrict the range of imputed values can result in biased estimates for the marginal mean when data are highly skewed.

【 授权许可】

   
2014 Rodwell et al.; licensee BioMed Central Ltd.

【 预 览 】
附件列表
Files Size Format View
20140727071125308.pdf 285KB PDF download
24KB Image download
【 图 表 】

【 参考文献 】
  • [1]Little RJ, D’Agostino R, Cohen ML, Dickersin K, Emerson SS, Farrar JT, Frangakis C, Hogan JW, Molenberghs G, Murphy SA, Neaton JD, Rotnitzky A, Scharfstein D, Shih WJ, Siegel JP, Stern H: The prevention and treatment of missing data in clinical trials. N Engl J Med 2012, 367(14):1355-1360.
  • [2]Ware JH, Harrington D, Hunter DJ, D’Agostino RB: Missing Data. New Engl J Med 2012, 367(14):1353-1354.
  • [3]Rubin DB: Multiple imputation for nonresponse in surveys. New York: Wiley; 1987.
  • [4]Rubin DB: Multiple Imputation after 18+ Years. J Am Stat Assoc 1996, 91(434):473-489.
  • [5]Schafer JL, Graham JW: Missing data: Our view of the state of the art. Psychol Methods 2002, 7(2):147-177.
  • [6]Schafer JL: Analysis of Incomplete Multivariate Data. New York: Chapman & Hall; 1997.
  • [7]van Buuren S: Multiple imputation of discrete and continuous data by fully conditional specification. Stat Methods Med Res 2007, 16(3):219-242.
  • [8]Hussain S, Mohammed MA, Haque MS, Holder R, Macleod J, Hobbs R: A Simple Method to Ensure Plausible Multiple Imputation for Continuous Multivariate Data. Commun Stat Simulat 2010, 39(9):1779-1784.
  • [9]Raghunathan TE, Lepowski JM, van Howeyk JPS: A multivariate technique for multiply imputing missing values using a sequence of regression models. Surv Methodol 2001, 27(1):85-95.
  • [10]Song J, Belin TR: Refining multivariate normal imputations to accommodate non-normal data. https://www.amstat.org/Sections/Srms/Proceedings/y2004/Files/Jsm2004-000905.pdf webcite
  • [11]He Y: Missing data analysis using multiple imputation: getting to the heart of the matter. Circ Cardiovasc Qual Outcomes 2010, 3(1):98-105.
  • [12]Chen L, Toma-Drane M, Valois RF, Drane JW: Multiple imputation for missing ordinal data. J Mod App Stat 2005, 4(1):26.
  • [13]Lee KJ, Carlin JB: Multiple Imputation for Missing Data: Fully Conditional Specification Versus Multivariate Normal Imputation. Am J Epidemiol 2010, 171(5):624-632.
  • [14]von Hippel PT: Should a Normal Imputation Model be Modified to Impute Skewed Variables? Sociol Method Res 2013, 42(1):105-138.
  • [15]Little RJA: Missing-Data Adjustments in Large Surveys. J Bus Econ Stat 1988, 6(3):287-296.
  • [16]Swift W, Coffey C, Degenhardt L, Carlin JB, Romaniuk H, Patton GC: Cannabis and progression to other substance use in young adults: findings from a 13-year prospective population-based study. J Epidemiol Community Health 2012, 66(7):e26.
  • [17]Goldberg D, Williams P: A user’s guide to the GHQ. NFER-Nelson: Windsor; 1988.
  • [18]Donath S: The validity of the 12-item General Health Questionnaire in Australia: a comparison between three scoring methods. Aust N Z J Psychiatry 2001, 35(2):231-235.
  • [19]Goodchild ME, Duncan-Jones P: Chronicity and the General Health Questionnaire. Brit J Psychiat 1985, 146(1):55-61.
  • [20]Brand JPL, Buuren S, Groothuis-Oudshoorn K, Gelsema ES: A toolkit in SAS for the evaluation of multiple imputation methods. Statistica Neerlandica 2003, 57(1):36-45.
  • [21]van Buuren S: Flexible Imputation of Missing Data. Boca Raton, FL: CRC Press; 2012.
  • [22]Morris TP: Practical Use of Multiple Imputation. In PhD Thesis. University College London: MRC Clinical Trials Unit; 2013.
  • [23]StataCorp: Stata: Release 13. Statistical Software. College Station, TX: StataCorp LP; 2013.
  • [24]Horton NJ, Lipsitz SR, Parzen M: A Potential for Bias When Rounding in Multiple Imputation. Am Stat 2003, 57(4):229-232.
  • [25]Lee KJ, Galati JC, Simpson JA, Carlin JB: Comparison of methods for imputing ordinal data using multivariate normal imputation: a case study of non-linear effects in a large cohort study. Stat Med 2012, 31(30):4164-4174.
  文献评价指标  
  下载次数:8次 浏览次数:4次