期刊论文详细信息
Emerging Themes in Epidemiology
The impact of missing data on analyses of a time-dependent exposure in a longitudinal cohort: a simulation study
Julie A Simpson3  John B Carlin1  Dallas R English3  Katherine J Lee2  Laura Baglietto3  Amalia Karahalios3 
[1] Clinical Epidemiology and Biostatistics Unit, Murdoch Childrens Research Institute, Parkville, VIC, Australia;Department of Paediatrics, The University of Melbourne, Parkville, VIC, Australia;Cancer Epidemiology Centre, Cancer Council Victoria, Melbourne, Australia
关键词: Repeated exposure measurement;    Complete-case analysis;    Multiple imputation;    Missing exposure;    Simulation study;   
Others  :  803726
DOI  :  10.1186/1742-7622-10-6
 received in 2013-03-19, accepted in 2013-07-23,  发布年份 2013
PDF
【 摘 要 】

Background

Missing data often cause problems in longitudinal cohort studies with repeated follow-up waves. Research in this area has focussed on analyses with missing data in repeated measures of the outcome, from which participants with missing exposure data are typically excluded. We performed a simulation study to compare complete-case analysis with Multiple imputation (MI) for dealing with missing data in an analysis of the association of waist circumference, measured at two waves, and the risk of colorectal cancer (a completely observed outcome).

Methods

We generated 1,000 datasets of 41,476 individuals with values of waist circumference at waves 1 and 2 and times to the events of colorectal cancer and death to resemble the distributions of the data from the Melbourne Collaborative Cohort Study. Three proportions of missing data (15, 30 and 50%) were imposed on waist circumference at wave 2 using three missing data mechanisms: Missing Completely at Random (MCAR), and a realistic and a more extreme covariate-dependent Missing at Random (MAR) scenarios. We assessed the impact of missing data on two epidemiological analyses: 1) the association between change in waist circumference between waves 1 and 2 and the risk of colorectal cancer, adjusted for waist circumference at wave 1; and 2) the association between waist circumference at wave 2 and the risk of colorectal cancer, not adjusted for waist circumference at wave 1.

Results

We observed very little bias for complete-case analysis or MI under all missing data scenarios, and the resulting coverage of interval estimates was near the nominal 95% level. MI showed gains in precision when waist circumference was included as a strong auxiliary variable in the imputation model.

Conclusions

This simulation study, based on data from a longitudinal cohort study, demonstrates that there is little gain in performing MI compared to a complete-case analysis in the presence of up to 50% missing data for the exposure of interest when the data are MCAR, or missing dependent on covariates. MI will result in some gain in precision if a strong auxiliary variable that is not in the analysis model is included in the imputation model.

【 授权许可】

   
2013 Karahalios et al.; licensee BioMed Central Ltd.

【 预 览 】
附件列表
Files Size Format View
20140708045055683.pdf 411KB PDF download
Figure 3. 86KB Image download
Figure 2. 107KB Image download
Figure 1. 81KB Image download
【 图 表 】

Figure 1.

Figure 2.

Figure 3.

【 参考文献 】
  • [1]Karahalios A, Baglietto L, English D, Simpson J: A review of reporting missing data in cohort studies with repeated assessment of exposure measures. BMC Med Res Methodol 2012, 12:96. BioMed Central Full Text
  • [2]Eekhout I, de Boer RM, Twisk JWR, de Vet HCW, Heymans MW: Missing data: a systematic review of how they are reported and handled. Epidemiology 2012, 23(5):729-732.
  • [3]Marshall A, Altman DG, Royston P, Holder RL: Comparison of techniques for handling missing covariate data within prognostic modelling studies: a simulation study. BMC Med Res Methodol 2010, 10:7. BioMed Central Full Text
  • [4]White IR, Carlin JB: Bias and efficiency of multiple imputation compared with complete-case analysis for missing covariate values. Stat Med 2010, 29(28):2920-31.
  • [5]van der Heijden GJMG, Donders ART, Stijnen T, Moons KGM: Imputation of missing values is superior to complete case analysis and the missing-indicator method in multivariable diagnostic research: a clinical example. J Clin Epidemiol 2006, 59(10):1102-1109.
  • [6]Vach W, Blettner M: Biased estimation of the odds ratio in case-control studies due to the use of ad hoc methods of correcting for missing values for confounding variables. Am J Epidemiol 1991, 134(8):895-907.
  • [7]SAS Insitute Inc: SAS OnlineDoc, Version 8. Cary, NC: SAS Institute, Inc.; 2000.
  • [8]StataCorp: Stata statistical software: Release 11. College Station, TX: StataCorp LP; 2009.
  • [9]Little RJA, Rubin DB: Statistical analysis with missing data (2nd edition). New York: J Wiley & Sons; 2002.
  • [10]Demissie S, LaValley MP, Horton NJ, Glynn RJ, Cupples LA: Bias due to missing exposure data using complete-case analysis in the proportional hazards regression model. Stat Med 2003, 22(4):545-557.
  • [11]Knol MJ, Janssen KJM, Donders ART, Egberts ACG, Heerdink ER, Grobbee DE, Moons KGM, Geerlings MI: Unpredictable bias when using the missing indicator method or complete case analysis for missing confounder values: an empirical example. J Clin Epidemiol 2010, 63(7):728-736.
  • [12]Moons KGM, Donders RART, Stijnen T, Harrell FEJr: Using the outcome for imputation of missing predictor values was preferred. J Clin Epidemiol 2006, 59(10):1092-1101.
  • [13]Peyre H, Leplège A, Coste J: Missing data methods for dealing with missing items in quality of life questionnaires. A comparison by simulation of personal mean score, full information maximum likelihood, multiple imputation, and hot deck techniques applied to the SF-36 in the French 2003 decennial health survey. Qual Life Res 2011, 20(2):287-300.
  • [14]Touloumi G, Babiker AG, Pocock SJ, Darbyshire JH: Impact of missing data due to drop-outs on estimators for rates of change in longitudinal studies: a simulation study. Stat Med 2001, 20(24):3715-3728.
  • [15]Janssen KJM, Donders ART, Harrell FE Jr, Vergouwe Y, Chen Q, Grobbee DE, Moons KGM: Missing covariate data in medical research: to impute is better than to ignore. J Clin Epidemiol 2010, 63(7):721-727.
  • [16]Ambler G, Omar RZ, Royston P: A comparison of imputation techniques for handling missing predictor values in a risk model with a binary outcome. Stat Methods Med Res 2007, 16(3):277-298.
  • [17]Rajan KB, Leurgans SE: Joint modeling of missing data due to non-participation and death in longitudinal aging studies. Stat Med 2010, 29(21):2260-2268.
  • [18]Shardell M, Miller RR: Weighted estimating equations for longitudinal studies with death and non-monotone missing time-dependent covariates and outcomes. Stat Med 2008, 27(7):1008-1025.
  • [19]Giles GG, English DR: The Melbourne Collaborative Cohort Study. IARC Sci Publ 2002, 156:69-70.
  • [20]Frezza EE, Wachtel MS, Chiriva-Internati M: Influence of obesity on the risk of developing colon cancer. Gut 2006, 55(2):285-291.
  • [21]MacInnis R, English D, Hopper J, Haydon A, Gertig D, Giles G: Body size and composition and colon cancer risk in men. Cancer Epidemiol Biomarkers Prev 2004, 13(4):553.
  • [22]MacInnis R, English D, Hopper J, Gertig D, Haydon A, Giles G: Body size and composition and colon cancer risk in women. Int J Cancer 2006, 118(6):1496-1500.
  • [23]MacInnis R, English D, Haydon A, Hopper J, Gertig D, Giles G: Body size and composition and risk of rectal cancer (Australia). Cancer Causes Control 2006, 17(10):1291-1297.
  • [24]Rapp K, Klenk J, Ulmer H, Concin H, Diem G, Oberaigner W, Schroeder J: Weight change and cancer risk in a cohort of more than 65,000 adults in Austria. Ann Oncol 2008, 19(4):641-648.
  • [25]Thygesen LC, Grønbaek M, Johansen C, Fuchs CS, Willett WC, Giovannucci E: Prospective weight change and colon cancer risk in male US health professionals. Int J Cancer 2008, 123(5):1160-1165.
  • [26]Lohman T, Roche A, Martorell R (Eds): Anthropometric standardization reference manual. Champaign IL: Kinetics Books; 1988.
  • [27]Burton A, Altman DG, Royston P, Holder RL: The design of simulation studies in medical statistics. Stat Med 2006, 25(24):4279-4292.
  • [28]Tannenbaum S, Holford N, Lee H, Peck C, Mould D: Simulation of correlated continuous and categorical variables using a single multivariate distribution. J Pharmacokinet Pharmacodyn 2006, 33(6):773-794.
  • [29]Bender R, Augustin T, Blettner M: Generating survival times to simulate Cox proportional hazards models. Stat Med 2005, 24(11):1713-1723.
  • [30]Little RJ: Modeling the drop-out mechanism in repeated-measures studies. J Am Stat Assoc 1995, 90(431):1112-1121.
  • [31]Schafer J, Olsen M: Multiple imputation for multivariate missing-data problems: a data analyst’s perspective. Multivariate Behav Res 1998, 33(4):545-571.
  • [32]White I, Royston P: Imputing missing covariate values for the Cox model. Stat Med 2009, 28(15):1982-1998.
  • [33]Rubin D: Multiple imputation for nonresponse in surveys. New York: J Wiley & Sons; 1987.
  • [34]Sterne J, White I, Carlin J, Spratt M, Royston P, Kenward M, Wood A, Carpenter J: Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls. BMJ 2009, 338:b2393.
  • [35]Jelicić H, Phelps E, Lerner RM: Why missing data matter in the longitudinal study of adolescent development: using the 4-H Study to understand the uses of different missing data methods. J Youth Adolesc 2010, 39(7):816-835.
  • [36]Xu Q, Paik MC, Rundek T, Elkind MSV, Sacco RL: Reweighting estimators for Cox regression with missing covariate data: analysis of insulin resistance and risk of stroke in the Northern Manhattan Study. Stat Med 2011, 30(28):3328-3340.
  • [37]Bassett JK, Severi G, English DR, Baglietto L, Krishnan K, Hopper JL, Giles GG: Body size, weight change, and risk of colon cancer. Cancer Epidemiol Biomarkers Prev 2010, 19(11):2978-2986.
  • [38]Laake I, Thune I, Selmer R, Tretli S, Slattery ML, Veierød MB: A prospective study of body mass index, weight change, and risk of cancer in the proximal and distal colon. Cancer Epidemiol Biomarkers Prev 2010, 19(6):1511-1522.
  • [39]Donders ART, van der Heijden GJMG, Stijnen T, Moons KGM: Review: a gentle introduction to imputation of missing values. J Clin Epidemiol 2006, 59(10):1087-1091.
  • [40]Lee KJ, Carlin JB: Multiple imputation for missing data: fully conditional specification versus multivariate normal imputation. Am J Epidemiol 2010, 171(5):624-632.
  • [41]Liu J, Gelman A, Hill J, Su YS: On the stationary distribution of iterative imputations. 2010. arXiv preprint arXiv:1012.2902
  • [42]Graham J: Using modern missing data methods with auxiliary variables to mitigate the effects of attrition on statistical power. In Missing data: analysis and design. New York: Springer; 2012:253-275.
  • [43]Lee KJ, Carlin JB: Recovery of information from multiple imputation: a simulation study. Emerg Themes Epidemiol 2012, 9:3. BioMed Central Full Text
  • [44]R Development Core Team: R: A language and environment for statistical computing. Software. Vienna, Austria: R Foundation for Statistical Computing; 2004.
  • [45]IBM Corp: IBM SPSS statistics for windows. 2012. Version 21.0, Armonk, NY
  • [46]Mackinnon A: The use and reporting of multiple imputation in medical research - a review. J Intern Med 2010, 268(6):586-593.
  • [47]Schafer JL, Graham JW: Missing data: our view of the state of the art. Psychol Methods 2002, 7(2):147-177.
  • [48]Schafer J: Assumptions. In Analysis of incomplete multivariate data. New York: Chapman and Hall; 1997.
  • [49]Bradshaw PT, Ibrahim JG, Gammon MD: A Bayesian proportional hazards regression model with non-ignorably missing time-varying covariates. Stat Med 2010, 29(29):3017-3029.
  文献评价指标  
  下载次数:7次 浏览次数:28次