BMC Medical Research Methodology | |
A Monte Carlo simulation study comparing linear regression, beta regression, variable-dispersion beta regression and fractional logit regression at recovering average difference measures in a two sample design | |
Rahim Moineddin1  Christopher Meaney1  | |
[1] Department of Family and Community Medicine, University of Toronto, 500 University Avenue, Toronto M5G1V7, ON, Canada | |
关键词: Monte Carlo simulation; Multinomial distribution; Beta distribution; Fractional Logit regression; Variable-dispersion beta regression; Beta regression; Linear regression; Regression modelling; | |
Others : 1229875 DOI : 10.1186/1471-2288-14-14 |
|
received in 2013-08-30, accepted in 2014-01-21, 发布年份 2014 | |
【 摘 要 】
Background
In biomedical research, response variables are often encountered which have bounded support on the open unit interval - (0,1). Traditionally, researchers have attempted to estimate covariate effects on these types of response data using linear regression. Alternative modelling strategies may include: beta regression, variable-dispersion beta regression, and fractional logit regression models. This study employs a Monte Carlo simulation design to compare the statistical properties of the linear regression model to that of the more novel beta regression, variable-dispersion beta regression, and fractional logit regression models.
Methods
In the Monte Carlo experiment we assume a simple two sample design. We assume observations are realizations of independent draws from their respective probability models. The randomly simulated draws from the various probability models are chosen to emulate average proportion/percentage/rate differences of pre-specified magnitudes. Following simulation of the experimental data we estimate average proportion/percentage/rate differences. We compare the estimators in terms of bias, variance, type-1 error and power. Estimates of Monte Carlo error associated with these quantities are provided.
Results
If response data are beta distributed with constant dispersion parameters across the two samples, then all models are unbiased and have reasonable type-1 error rates and power profiles. If the response data in the two samples have different dispersion parameters, then the simple beta regression model is biased. When the sample size is small (N0 = N1 = 25) linear regression has superior type-1 error rates compared to the other models. Small sample type-1 error rates can be improved in beta regression models using bias correction/reduction methods. In the power experiments, variable-dispersion beta regression and fractional logit regression models have slightly elevated power compared to linear regression models. Similar results were observed if the response data are generated from a discrete multinomial distribution with support on (0,1).
Conclusions
The linear regression model, the variable-dispersion beta regression model and the fractional logit regression model all perform well across the simulation experiments under consideration. When employing beta regression to estimate covariate effects on (0,1) response data, researchers should ensure their dispersion sub-model is properly specified, else inferential errors could arise.
【 授权许可】
2014 Meaney and Moineddin; licensee BioMed Central Ltd.
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
20151103023223878.pdf | 614KB | download | |
Figure 1. | 77KB | Image | download |
【 图 表 】
Figure 1.
【 参考文献 】
- [1]Johnson N, Kotz S, Balakrishnan N: Continuous Univariate Distributions. 2nd edition. Hoboken, New Jersey: Wiley; 1995.
- [2]Gupta A, Nadarajah S: Handbook of Beta Distribution and its Applications. 1st edition. Boca Raton, Florida: CRC Press; 2004.
- [3]Paolino P: Maximum likelihood estimation of models with beta distributed dependent variables. Polit Anal 2001, 9(4):325-346.
- [4]Ferrari S, Cribrari-Neto F: Beta regression for modelling rates and proportions. J Appl Stat 2004, 10:1-18.
- [5]Smithson M, Verkuilen J: A better lemon squeezer? Maximum-likelihood regression with beta distributed dependent variables. Psychol Methods 2006, 11(1):54-71.
- [6]McCullagh P, Nelder J: Generalized linear models. 2nd edition. Boca Raton: CRC Press; 1989.
- [7]Ferrari S: Beta Regression Modelling: Recent Advances and in Theory and Applications. 2013. Unpublished presentation: http://www.ime.usp.br/~sferrari/13EMRslidesSilvia.pdf webcite
- [8]Papke L, Wooldridge J: Econometric methods for fractional response variables with an application to 401(K) plan participation rates. J Appl Econ 1996, 11:619-632.
- [9]Cox C: Non-linear quasi-likelihood models: applications to continuous proportions. Comput Stat Data Anal 1996, 21(4):449-461.
- [10]Weisberg S: Applied Linear Regression. 3rd edition. Hoboken, New Jersey: Wiley; 2005.
- [11]White H: Asymptotic Theory for Econometricians. San Diego, California: Academic Press; 2000.
- [12]Kosmidis I, Firth D: A generic algorithm for reducing bias in parametric estimation. Electron J Stat 2010, 4:1097-1112.
- [13]Grun B, Kosmidis I, Zeileis A: Extended beta regression in R: shaken, stirred, mixed and partitioned. J Stat Softw 2012, 48(11):1-25.
- [14]Knight K: Mathematical Statistics. Boca Raton, Florida: CRC Press; 2000.
- [15]Wasserman L: All of Statistics: A Concise Course in Statistical Inference. New York, New York: Springer; 2004.
- [16]White I: SIMSUM: analyses of simulation studies including Monte Carlo Error. Stata J 10(3):369-385.
- [17]R Development Core Team: R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing; 2013. http://www.R-project.org/ webcite
- [18]SAS Institute North Carolina, USA; 2013. http://www.sas.com/en_us/legal/editorial-guidelines.html webcite
- [19]Zeileis A: Econometric computing with HC and HAC covariance matrix estimators. J Stat Softw 2004, 11(10):1-17.
- [20]Jackson C: Multi-state models for panel data: the msm package for R. J Stat Softw 2011, 38(8):1-29.
- [21]Kieschnick R, McCullough B: Regression analysis of variates observed on (0,1): percentages, proportions and fractions. Stat Model 2003, 3(3):193-213.
- [22]Hunger M, Beaumert J, Holle R: Analysis of SF-6D index data: is beta regression appropriate? Value Health 2011, 14:759-767.
- [23]Swearingen C, Tilley B, Adams R, Rumboldt Z, Nicholas J, Bandyopadhyay D, Woolson R: Application of beta regression to analyze ischemic stroke volume in NINDS rt-PA clinical trials. Methods in Neuroepidemiology 2011, 37:73-82.