BMC Medical Research Methodology | |
Joint modelling rationale for chained equations | |
Jonathan AC Sterne1  Kate Tilling1  James R Carpenter3  Shaun R Seaman2  Ian R White2  Rachael A Hughes1  | |
[1] School of Social and Community Medicine, University of Bristol, Bristol, UK;MRC Biostatistics Unit, Institute of Public Health, Cambridge, UK;MRC Clinical Trials Unit, London, UK | |
关键词: Multivariate missing data; Multiple imputation; Joint modelling imputation; Gibbs sampling; Chained equations imputation; | |
Others : 866425 DOI : 10.1186/1471-2288-14-28 |
|
received in 2013-09-12, accepted in 2014-02-13, 发布年份 2014 | |
【 摘 要 】
Background
Chained equations imputation is widely used in medical research. It uses a set of conditional models, so is more flexible than joint modelling imputation for the imputation of different types of variables (e.g. binary, ordinal or unordered categorical). However, chained equations imputation does not correspond to drawing from a joint distribution when the conditional models are incompatible. Concurrently with our work, other authors have shown the equivalence of the two imputation methods in finite samples.
Methods
Taking a different approach, we prove, in finite samples, sufficient conditions for chained equations and joint modelling to yield imputations from the same predictive distribution. Further, we apply this proof in four specific cases and conduct a simulation study which explores the consequences when the conditional models are compatible but the conditions otherwise are not satisfied.
Results
We provide an additional “non-informative margins” condition which, together with compatibility, is sufficient. We show that the non-informative margins condition is not satisfied, despite compatible conditional models, in a situation as simple as two continuous variables and one binary variable. Our simulation study demonstrates that as a consequence of this violation order effects can occur; that is, systematic differences depending upon the ordering of the variables in the chained equations algorithm. However, the order effects appear to be small, especially when associations between variables are weak.
Conclusions
Since chained equations is typically used in medical research for datasets with different types of variables, researchers must be aware that order effects are likely to be ubiquitous, but our results suggest they may be small enough to be negligible.
【 授权许可】
2014 Hughes et al.; licensee BioMed Central Ltd.
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
20140727072932336.pdf | 298KB | download | |
31KB | Image | download |
【 图 表 】
【 参考文献 】
- [1]Rubin DB: Multiple Imputation for Nonresponse in Surveys. New York: John Wiley & Sons, Inc; 1987.
- [2]Schafer JL: Analysis of Incomplete Multivariate Data. London: Chapman & Hall; 1997.
- [3]van Buuren S: Multiple imputation of discrete and continuous data by fully conditional specification. Stat Methods Med Res 2007, 16:219-242.
- [4]Raghunathan TE, Lepkowski JM, van Hoewyk J, Solenberger P: A multivariate technique for multiply imputing missing values using a sequence of regression models. Surv Methodol 2001, 27:85-95.
- [5]Gelman A, Raghunathan TE: [Conditionally specified distributions: An introduction]: comment. Stat Sci 2001, 16:268-269.
- [6]van Buuren S, Boshuizen HC, Knook DL: Multiple imputation of missing blood pressure covariates in survival analysis. Statist Med 1999, 18:681-694.
- [7]van Buuren S, Groothuis-Oudshoorn K: mice: Multivariate Imputation by Chained Equations in R. J Stat Softw 2011, 45:1-67.
- [8]Morgenstern M, Wiborg G, Isensee B, Hanewinkel R: School-based alcohol education: Results of a cluster-randomized controlled trial. Addiction 2009, 104:402-412.
- [9]Mueller B, Cummings P, Rivara F, Brooks M, Terasaki R: Injuries of the head, face, and neck in relation to ski helmet use. Epidemiology 2008, 19:270-276.
- [10]Nash D, Katyal M, Brinkhof M, Keiser O, May M, Hughes R, Dabis F, Wood R, Sprinz E, Schechter M, Egger M: Long-term immunologic response to antiretroviral therapy in low-income countries: A collaborative analysis of prospective studies. AIDS 2008, 22:2291-2302.
- [11]Souverein O, Zwinderman A, Tanck T: Multiple imputation of missing genotype data for unrelated individuals. Ann Hum Genet 2006, 70:372-381.
- [12]Huo D, Adebamowo C, Ogundiran T, Akang E, Campbell O, Adenipekun A, Cummings S, Fackenthal J, Ademuyiwa F, Ahsan H, Olopade O: Parity and breastfeeding are protective against breast cancer in nigerian women. Br J Cancer 2008, 98:992-996.
- [13]Wiles N, Jones G, Haase A, Lawlor D, Macfarlane G, Lewis G: Physical activity and emotional problems amongst adolescents. Soc Psychiatry Psychiatric Epidemiol 2008, 43:765-772.
- [14]Azur M, Stuart E, Frangakis C, Leaf P: Multiple imputation by chained equations: what is it and how does it work? Int J Methods Psychiatric Res 2011, 20:40-49.
- [15]White IR, Royston P, Wood A: Multiple imputation using chained equations: Issues and guidance for practice. Stat Med 2011, 30:377-399.
- [16]Arnold BC, Press SJ: Compatible conditional distributions. J Am Statist Assoc 1989, 84:152-156.
- [17]Heckerman D, Chickering DM, Meek C, Rounthwaite R, Kadie C: Dependency networks for inference, collaborative filtering, and data visualization. J Mach Learn Res 2000, 1:49-75.
- [18]van Buuren S, Brand JPL, Groothuis-Oudshoorn CGM, Rubin DB: Fully conditional specification in multivariate imputation. J Stat Comput Simulat 2006, 76:1049-1064.
- [19]Liu J, Gelman A, Hill J, Su Y, Kropko J: On the stationary distribution of iterative imputations. Biometrika 2013. doi: 10.1093/biomet/ast044
- [20]Little RJA, Rubin DB: Statistical Analysis with Missing Data. New York: John Wiley & Sons, Inc; 2002.
- [21]Tanner MA, Wong WH: The calculation of posterior distributions by data augmentation. J Am Statist Assoc 1987, 82:528-540.
- [22]Arnold B, Castillo E, Sarabia J: Conditionally specified distributions an introduction. Stat Sci 2001, 16:249-265.
- [23]Kirkwood B, Sterne J: Essential Medical Statistics. Hoboken, New Jersey, US: Wiley-Blackwell; 2003.
- [24]Albert J: Bayesian computation with R. Dordrecht, Heidelberg, London, New York: Springer; 2009.
- [25]Efron B: The efficiency of logistic regression compared to normal discriminant analysis. J Am Statist Assoc 1975, 70:892-898.
- [26]Cox D, Snell E: Analysis of Binary Data. second edition. London, UK: Chapman and Hall; 1989.
- [27]van Buuren S, Oudshoorn C: Multivariate Imputation by Chained Equations (Mice V1.0 User’s Manual). 2000. http://www.stefvanbuuren.nl/publications/MICE\%20V1.0\%20Manual\%20TNO00038\%202000.pdf webcite
- [28]Whittaker J: Graphical models in applied multivariate statistics. New York: John Wiley & Sons, Inc; 1990.
- [29]Asmussen S, Edwards D: Collapsibility and response variables in contingency tables. Biometrika 1983, 70:567-578.
- [30]Olkin I, Tate RF: Multivariate correlation models with mixed discrete and continuous variables. Ann Math Stat 1961, 32:448-465.
- [31]Arnold B, Castillo E, Sarabia J: Compatibility of partial or complete conditional probability specifications. J Stat Plann Inference 2004, 123:133-159.
- [32]Ip E, Wang Y: Canonical representation of conditionally specified multivariate discrete distributions. J Multivariate Anal 2009, 100:1282-1290.
- [33]Tian G, Tan M, Ng K, Tang M: A unified method for checking compatibility and uniqueness for discrete conditional distributions. Commun Stat: Theory Methods 2009, 38:115-129.
- [34]Chen H: Compatibility of conditionally specified models. Stat Probability Lett 2010, 80:670-677.
- [35]Kuo K, Wang Y: A simple algorithm for checking compatibility among discrete distributions. Comput Stat Data Anal 2011, 55:2457-2462.
- [36]Horton N, Kleinman K: Much ado about nothing: A comparison of missing data methods and software to fit incomplete data regression models. Am Stat 2007, 61:79-90.
- [37]Yu LM, Burton A, Rivero-Arias O: Evaluation of software for multiple imputation of semi-continuous data. Stat Methods Med Res 2007, 16:243-258.
- [38]Kenward M, Carpenter J: Multiple imputation: current perspectives. Stat Methods Med Res 2007, 16:199-218.