Regression via Clustering using Dirichlet Mixtures
Bayesian;clustering;Dirichlet mixtures
Kang, Changku ; Hao H. Zhang, Committee Member,Subhashis Ghosal, Committee Chair,John F. Monahan, Committee Member,Sujit K. Ghosh, Committee Member,Kang, Changku ; Hao H. Zhang ; Committee Member ; Subhashis Ghosal ; Committee Chair ; John F. Monahan ; Committee Member ; Sujit K. Ghosh ; Committee Member
Regression analysis is a fundamental problem of statistics. When the regression function has an unknown form, parametric analysis is sometimes inappropriate. In such a situation, the regression function should be estimated by nonparametric methods. Often, the regressor variable is sampled from several different subpopulations and the regression function has different forms depending on the source. The labels of these source subpopulations are not observable. Although a nonparametrically specified regression function can capture the overall regression function, nonparametric regression estimates are usually dependent on the assumption of homoscedasticity of additive errors. If the underlying distribution of X has unknown clusters, then the usual assumption, the homoscedasity does not hold. In estimating the regression function, we propose the idea of first finding clusters in the regressor variables by the Dirichlet mixture to impute lost subpopulation labels. A standard regression method such as linear or polynomial regression then may be used within each cluster.Markov Chain Monte Carlo (MCMC) sampling method is used to find the clusters and for each sample the estimated regression functions can be obtained. We also apply our method to the large p, small n problem, where the number of variables p is much greater than the number of samples n. In several simulation experiments, our method is compared to other methods such as kernel and smoothing splines in the univariate case and GAM (generalized additive model) and MARS (Multivariate Adaptive Regression Splines) in the multivariate case. The consistency issue is discussed without explicit proof.
【 预 览 】
附件列表
Files
Size
Format
View
Regression via Clustering using Dirichlet Mixtures