期刊论文详细信息
BMC Medical Research Methodology
Model development including interactions with multiple imputed data
Graciela Mentz1  Delia North2  Temesgen Zewotir2  Rajen N Naidoo3  Gillian M Hendry2 
[1] Department of Environmental Health Sciences, School of Public Health, University of Michigan, 6655 SPH I, Ann Arbor, MI 48109-2029, USA;School of Mathematics, Statistics and Computer Science, University of KwaZulu-Natal, Westville Campus, University Road, Westville, Durban, South Africa;Discipline of Occupational and Environmental Health, School of Nursing and Public Health, University of KwaZulu-Natal, Durban, South Africa
关键词: Ordinal regression;    Multiple imputation;    Model development;    Missing data;    Interactions;   
Others  :  1090231
DOI  :  10.1186/1471-2288-14-136
 received in 2014-08-01, accepted in 2014-11-26,  发布年份 2014
PDF
【 摘 要 】

Background

Multiple imputation is a reliable tool to deal with missing data and is becoming increasingly popular in biostatistics. However, building a model with interactions that are not specified a priori, in the presence of missing data, presents a challenge. On the one hand, the interactions are needed to impute the data, while on the other hand, the data is needed to identify the interactions. The objective of this study was to present a way in which this challenge can be addressed.

Methods

This paper investigates two strategies in which model development with interactions is achieved using a single data set generated from the Expectation Maximization (EM) algorithm. Imputation using both the fully conditional specification approach and the multivariate normal approach is carried out and results are compared. The strategies are illustrated with data from a study of ambient pollution and childhood asthma in Durban, South Africa.

Results

The different approaches to model building and imputation yielded similar results despite the data being mainly categorical. Both strategies investigated for building the model using the multivariate normal imputed data resulted in the identical set of variables and interactions being identified; while models built using data imputed by fully conditional specification were marginally different for the two strategies. It was found that, for both imputation approaches, model building with backward elimination applied to the initial EM data set was easier to implement, and produced good results, compared to those from a complete case analysis.

Conclusions

Developing a predictive model including interactions with data that suffers from missingness is easily done by identifying significant interactions and then applying backward elimination to a single data set imputed from the EM algorithm. It is hoped that this idea can be further developed and, by addressing this practical dilemma, there will be increased adoption of multiple imputation in medical research when data suffers from missingness.

【 授权许可】

   
2014 Hendry et al.; licensee BioMed Central.

【 预 览 】
附件列表
Files Size Format View
20150128155121160.pdf 402KB PDF download
Figure 1. 47KB Image download
【 图 表 】

Figure 1.

【 参考文献 】
  • [1]Klebanoff MA, Cole SR: Use of multiple imputation in the epidemiologic literature. Am J Epidemiol 2008, 168(4):355-357.
  • [2]Greenland S, Finkle WD: A critical look at methods for handling missing covariates in epidemiologic regression analyses. Am J Epidemiol 1995, 142(12):1255-1264.
  • [3]Little RJA, Rubin DB: Statistical Analysis With Missing Data. New York: J. Wiley; 1987.
  • [4]Rubin DB: Multiple imputation for nonresponse in surveys. New York: Wiley; 1987.
  • [5]Graham JW: Missing data analysis: making it work in the real world. Annu Rev Psychol 2009, 60:549-576.
  • [6]Lee KJ, Carlin JB: Multiple imputation for missing data: fully conditional specification versus multivariate normal imputation. Am J Epidemiol 2010, 171(5):624-632.
  • [7]Donders ART, van der Heijden GJ, Stijnen T, Moons KG: Review: a gentle introduction to imputation of missing values. J Clin Epidemiol 2006, 59(10):1087-1091.
  • [8]Schafer JL, Olsen MK: Multiple imputation for multivariate missing-data problems: a data analyst’s perspective. Multivariate Behav Res 1998, 33(4):545-571.
  • [9]Graham JW: Missing data: Analysis and Design. New York: Springer; 2012.
  • [10]Collins LM, Schafer JL, Kam C-M: A Comparison of Inclusive and Restrictive Strategies in Modern Missing Data Procedures. Psychological Methods 2001, 6:330-351.
  • [11]Graham JW, Hofer SM, Donaldson SI, MacKinnon DP, Schafer JL: Analysis with missing data in prevention research. In The science of prevention: methodological advances from alcohol and substance abuse research. Washington D.C.: American Psychological Association; 1997:325-366.
  • [12]Rubin DB: Multiple imputation after 18+ years. J Am Stat Assoc 1996, 91(434):473-489.
  • [13]Stuart EA, Azur M, Frangakis C, Leaf P: Multiple imputation with large data sets: a case study of the Children’s Mental Health Initiative. Am J Epidemiol 2009, 169(9):1133-1139.
  • [14]Schafer J: Analysis of incomplete multivariate data. London: Chapman & Hall; 1997.
  • [15]Vergouwe Y, Royston P, Moons KG, Altman DG: Development and validation of a prediction model with missing predictor data: a practical approach. J Clin Epidemiol 2010, 63(2):205-214.
  • [16]White IR, Royston P, Wood AM: Multiple imputation using chained equations: issues and guidance for practice. Stat Med 2011, 30(4):377-399.
  • [17]Wood AM, White IR, Royston P: How should variable selection be performed with multiply imputed data? Stat Med 2008, 27(17):3227-3246.
  • [18]Naidoo RN, Robins TG, Batterman S, Mentz G, Jack C: Ambient pollution and respiratory outcomes among schoolchildren in Durban, South Africa. SAJCH 2013, 7(4):127-134.
  • [19]Schafer J: NORM: Multiple imputation of incomplete multivariate data under a normal model [Computer software]. University Park: Pennsylvania State University, Department of Statistics; 1999.
  • [20]Allison PD: Missing data. Thousand Oaks, CA: SAGE; 2002.
  • [21]SPSS inc: Build Better Models When You Fill in the Blanks. 2014. Available from: http://www.spss.com/media/collateral/statistics/missing-values.pdfs webcite
  • [22]Azur MJ, Stuart EA, Frangakis C, Leaf PJ: Multiple imputation by chained equations: what is it and how does it work? Int J Methods Psychiatr Res 2011, 20(1):40-49.
  • [23]Raghunathan TE, Solenberger PW, Van Hoewyk J: IVEware: Imputation and variance estimation software. Ann Arbor, MI: Survey Methodology Program, Survey Research Center, Institute for Social Research, University of Michigan; 2002.
  • [24]Graham JW, Olchowski AE, Gilreath TD: How many imputations are really needed? Some practical clarifications of multiple imputation theory. Prev Sci 2007, 8(3):206-213.
  • [25]Von Hippel PT: How to impute interactions, squares, and other transformed variables. Sociol Methodol 2009, 39(1):265-291.
  • [26]Abayomi K, Gelman A, Levy M: Diagnostics for multivariate imputations. J R Stat Soc Ser C Appl Stat 2008, 57(3):273-291.
  • [27]Desai M, Esserman DA, Gammon MD, Terry MB: The use of complete-case and multiple imputation-based analyses in molecular epidemiology studies that assess interaction effects. Epidemiol Perspect Innovat 2011, 8(1):5. BioMed Central Full Text
  • [28]Graham JW, Schafer JL: On the performance of multiple imputation for multivariate data with small sample size. Statistical strategies for small sample research 1999, 50:1-27.
  • [29]Finch WH: Imputation methods for missing categorical questionnaire data: a comparison of approaches. J Data Sci 2010, 8(3):361-378.
  • [30]Hardt J, Herke M, Leonhart R: Auxiliary variables in multiple imputation in regression with missing X: a warning against including too many in small sample research. BMC Medical Research Methodology 2012, 12(1):184. BioMed Central Full Text
  • [31]He Y: Missing data analysis using multiple imputation getting to the heart of the matter. Circ Cardiovasc Qual Outcomes 2010, 3(1):98-105.
  文献评价指标  
  下载次数:29次 浏览次数:30次