期刊论文详细信息
BMC Research Notes
Do the methods used to analyse missing data really matter? An examination of data from an observational study of Intermediate Care patients
Lucinda Billingham3  Stirling Bryan1  Billingsley Kaambwa2 
[1] Centre for Clinical Epidemiology and Evaluation, University of British Columbia, Research Pavilion 702-828 West 10th Ave, Vancouver, Canada;Health Economics Unit, Public Health Building, University of Birmingham, Edgbaston, Birmingham, B15 2TT, United Kingdom;MRC Midland Hub for Trials Methodology Research, University of Birmingham, Edgbaston, Birmingham, B15 2TT, United Kingdom
关键词: Observational data;    Heckman selection;    Generalised linear model;    Multiple imputation;    Complete case analysis;    Missing data;   
Others  :  1166213
DOI  :  10.1186/1756-0500-5-330
 received in 2012-04-10, accepted in 2012-06-27,  发布年份 2012
PDF
【 摘 要 】

Background

Missing data is a common statistical problem in healthcare datasets from populations of older people. Some argue that arbitrarily assuming the mechanism responsible for the missingness and therefore the method for dealing with this missingness is not the best option—but is this always true? This paper explores what happens when extra information that suggests that a particular mechanism is responsible for missing data is disregarded and methods for dealing with the missing data are chosen arbitrarily.

Regression models based on 2,533 intermediate care (IC) patients from the largest evaluation of IC done and published in the UK to date were used to explain variation in costs, EQ-5D and Barthel index. Three methods for dealing with missingness were utilised, each assuming a different mechanism as being responsible for the missing data: complete case analysis (assuming missing completely at random—MCAR), multiple imputation (assuming missing at random—MAR) and Heckman selection model (assuming missing not at random—MNAR). Differences in results were gauged by examining the signs of coefficients as well as the sizes of both coefficients and associated standard errors.

Results

Extra information strongly suggested that missing cost data were MCAR. The results show that MCAR and MAR-based methods yielded similar results with sizes of most coefficients and standard errors differing by less than 3.4% while those based on MNAR-methods were statistically different (up to 730% bigger). Significant variables in all regression models also had the same direction of influence on costs. All three mechanisms of missingness were shown to be potential causes of the missing EQ-5D and Barthel data. The method chosen to deal with missing data did not seem to have any significant effect on the results for these data as they led to broadly similar conclusions with sizes of coefficients and standard errors differing by less than 54% and 322%, respectively.

Conclusions

Arbitrary selection of methods to deal with missing data should be avoided. Using extra information gathered during the data collection exercise about the cause of missingness to guide this selection would be more appropriate.

【 授权许可】

   
2012 Kaambwa et al.; licensee BioMed Central Ltd.

【 预 览 】
附件列表
Files Size Format View
20150416042001126.pdf 342KB PDF download
Figure 1. 70KB Image download
【 图 表 】

Figure 1.

【 参考文献 】
  • [1]Schafer JL: Analysis of Incomplete Multivariate Data. Chapman & Hall, London; 1997.
  • [2]Biglan A, Severson H, Ary D, Faller C, Gallison C, Thompson R, et al.: Do smoking prevention programs really work? Attrition and the internal and external validity of an evaluation of a refusal skills training program. J Behav Med 1987, 10:159-171.
  • [3]Rubin DB: Multiple Imputation for Nonresponse in Surveys. John Wiley & Sons, New York; 1987.
  • [4]Dow MM, Anthon Eff E: Multiple Imputation of Missing Data in Cross-Cultural Samples. Cross-Cultural Research 2009, 43:206-229.
  • [5]Barry AE: How attrition impacts the internal and external validity of longitudinal research. J Sch Health 2005, 75:267-270.
  • [6]Little RJA, Rubin DB: Statistical Analysis with Missing Data. John Wiley, New York; 1987.
  • [7]Croninger RG, Douglas KM: Missing Data and Institutional Research. In Survey research. Emerging issues. New directions for institutional research #127. Edited by Umbach PD. Jossey-Bass, San Fransisco; 2005:33-50.
  • [8]Foster EM, Fang GY: Alternative methods for handling attrition: an illustration using data from the Fast Track evaluation. Eval Rev 2004, 28:434-464.
  • [9]Kmetic A, Joseph L, Berger C, Tenenhouse A: Multiple imputation to account for missing data in a survey: estimating the prevalence of osteoporosis. Epidemiology 2002, 13:437-444.
  • [10]Allison P: Missing data. Sage, Thousand Oaks, CA; 2000.
  • [11]Schafer JL: Multiple imputation: a primer. Stat Methods Med Res 1999, 8:3-15.
  • [12]Fielding S, Fayers P, Ramsay C: Predicting missing quality of life data that were later recovered: an empirical comparison of approaches. Clin Trials 2010, 7:333-342.
  • [13]Raymond MR, Roberts DM: A comparison of methods for treating incomplete data in selection research. Educational and PsychologicalMeasurement 1987, 47:13-26.
  • [14]Allison PD: Multiple imputation for missing data: a cautionary tale. Sociological methods and Research 2000, 28:301-309.
  • [15]Hedeker D, Gibbons RD: Application of random-effects pattern-mixture models for missing data in longitudinal studies. Psychological Methods 1997, 2:64-78.
  • [16]Schafer JL, Olsen MK: Multiple imputation for multivariate missing-data problems: A data analyst’s perspective. Multivariate Behavioral Research 1998, 33:545-571.
  • [17]Heckman JJ: The common structure of statistical models of truncation, sample selection and limited dependent variables and a simple estimator for such models. Annals of Economic and Social Measurement 1976, 5:475-492.
  • [18]Heitjan DF: Annotation: what can be done about missing data? Approaches to imputation. Am J Public Health 1997, 87:548-550.
  • [19]Curran D, Bacchi M, Schmitz SF, Molenberghs G, Sylvester RJ: Identifying the types of missingness in quality of life data from clinical trials. Stat Med 1998, 17:739-756.
  • [20]McKnight PE, McKnight KM, Sidani S, Figueredo AJ: Missing Data: A Gentle Introduction. The Gilford Press, New York; 2007.
  • [21]ICNET: A National Evaluation of the Costs and Outcomes of Intermediate Care for Older People: Final Report. The University of Leicester, Leicester; 2005.
  • [22]Kaambwa B, Bryan S, Barton P, Parker H, Martin G, Hewitt G, et al.: Costs and health outcomes of intermediate care: results from five UK case study sites. Health Soc Care Community 2008, 16:573-581.
  • [23]Kaambwa B, Billingham L, Bryan S: Mapping utility scores from the Barthel index. European Journal of Health Economics 2011 Nov 2. [Epub ahead of print]
  • [24]Brazier JE, Walters SJ, Nicholl JP, Kohler B: Using the SF-36 and Euroqol on an elderly population. Qual Life Res 1996, 5:195-204.
  • [25]Coast J, Peters TJ, Richards SH, Gunnell DJ: Use of the EuroQoL among elderly acute care patients. Qual Life Res 1998, 7:1-10.
  • [26]Lyons RA, Crome P, Monaghan S, Killalea D, Daley JA: Health status and disability among elderly people in three UK districts. Age Ageing 1997, 26:203-209.
  • [27]Brazier J, Roberts J, Tsuchiya A, Busschbach J: A comparison of the EQ-5D and SF-6D across seven patient groups. Health Econ 2004, 13:873-884.
  • [28]Dolan P: Modeling valuations for EuroQol health states. Med Care 1997, 35:1095-1108.
  • [29]Kind P, Hardman G, Macran S: UK population norms for EQ-5D. Discussion paper 172. University of York Centre for Health Economics, York; 1999.
  • [30]Murphy R, Sackley CM, Miller P, Harwood RH: Effect of experience of severe stroke on subjective valuations of quality of life after stroke. J Neurol Neurosurg Psychiatry 2001, 70:679-681.
  • [31]Sainsbury A, Seebass G, Bansal A, Young JB: Reliability of the Barthel Index when used with older people. Age Ageing 2005, 34:228-232.
  • [32]Minosso JSM, Amendola F, Alvarenga MRM, de Campos Oliveira MA: Validation of the Barthel Index in elderly patients attended in outpatient clinics, in Brazil. Acta Paul Enferm 2010, 23:218-223.
  • [33]Mahoney FI, Barthel D: Functional Evaluation: The Barthel Index. Md State Med J 1965, 14:61-65.
  • [34]Wolfe CD, Taub NA, Woodrow EJ, Burney PG: Assessment of scales of disability and handicap for stroke patients. Stroke 1991, 22:1242-1244.
  • [35]Shah S, Vanclay F, Cooper B: Improving the sensitivity of the Barthel Index for stroke rehabilitation. J Clin Epidemiol 1989, 42:703-709.
  • [36]Musil CM, Warner CB, Yobas PK, Jones SL: A comparison of imputation techniques for handling missing data. West J Nurs Res 2002, 24:815-829.
  • [37]Fielding S, Fayers PM, Ramsay CR: Investigating the missing data mechanism in quality of life outcomes: a comparison of approaches. Health Qual Life Outcomes 2009, 7:57. BioMed Central Full Text
  • [38]McCullagh P, Nelder JA: Generalized linear models. 2nd edition. Chapman & Hall, London; 1989.
  • [39]Manning WG, Mullahy J: Estimating log models: to transform or not to transform? J Health Econ 2001, 20:461-494.
  • [40]Altman D: Practical statistics for medical research. 2nd edition. Chapman & Hall, London; 1991.
  • [41]Cantoni E, Ronchetti E: A robust approach for skewed and heavy-tailed outcomes in the analysis of health care expenditures. J Health Econ 2006, 25:198-213.
  • [42]Duan N: Smearing estimate a nonparametric retransformation method. J Amer Statist Assoc 1983, 78:605-610.
  • [43]Kilian R, Matschinger H, Loeffler W, Roick C, Angermeyer MC: A comparison of methods to handle skew distributed cost variables in the analysis of the resource consumption in schizophrenia treatment. J Ment Health Policy Econ 2002, 5:21-31.
  • [44]Brazier JE, Yang Y, Tsuchiya A, Rowen DL: A review of studies mapping (or cross walking) non-preference based measures of health to generic preference-based measures. Eur J Health Econ 2010, 11:215-225.
  • [45]Gujarati D: Basic Econometrics. 3rd edition. McGraw-Hill, Inv, New York; 1995.
  • [46]Schafer JL: NORM: Multiple imputation of incomplete multivariate data under a normal model version 2 Software for Windows 95/98/NT. . http://www.stat.psu.edu/jls/misoftwa.html webcite
  • [47]StataCorp LP: Intercooled Stata 82 for Windows. US StataCorp LP, College Station, TX; 2004.
  • [48]Roderick P, Low J, Day R, Peasgood T, Mullee MA, Turnbull JC, et al.: Stroke rehabilitation after hospital discharge: a randomized trial comparing domiciliary and day-hospital care. Age Ageing 2001, 30:303-310.
  • [49]Cohen J, Cohen P: Applied multiple regression/correlation analysis for the behavioral sciences. 2nd edition. Erlbaum, Hillsdale, NJ; 1983.
  • [50]Schafer JL, Graham JW: Missing data: our view of the state of the art. Psychological Methods 2002, 7:147-177.
  • [51]David M, Little RJA, Samuhel ME, Triest RK: Alternative Methods for CPS Income Imputation. Journal of the American StatisticalAssociation 1986, 81:29-41.
  • [52]Verbeke G, Molenberghs G: Linear Mixed Models for Longitudinal Data. Springer, New York; 2000.
  • [53]Mallinckrodt CH, Sanger TM, Dube S, DeBrota DJ, Molenberghs G, Carroll RJ, et al.: Assessing and interpreting treatment effects in longitudinal clinical trials with missing data. Biol Psychiatry 2003, 53:754-760.
  • [54]Little RJA: Regression with missing X’s: a review. Journal of the American Statistical Association 1992, 87:1227-1237.
  • [55]Von Hippel PT: Regression with missing Ys: An improved strategy for analyzing multiply imputed data. Sociological Methodology 2007, 37:83-117.
  • [56]Glynn RJ, Laird NM, Rubin DB: Drawing Inferences from Self-selected Samples. In Selection modelling versus mixture modelling with nonignorable nonresponse. Edited by Wainer H. Springer, New York; 1986:115-142.
  • [57]Orme JG, Reis J: Multiple regression with missing data. Journal of Social Service Research 1991, 9:61-91.
  文献评价指标  
  下载次数:29次 浏览次数:34次