Model selection for correlated data and moment selection from high-dimensional moment conditions
diverging number of parameters;dynamic panel data models;generalized method of moments;high-dimensional moment conditions;moment selection;Longitudinal data;model selection;oracle property;quadratic inference function;smoothly clipped absolute\rdeviation (SCAD);singularity matrix
High-dimensional correlated data arise frequently in many studies. My primary research interests lie broadly in statistical methodology for correlated data such as longitudinal data and panel data. In this thesis, we address two important but challenging issues: model selection for correlated data with diverging number of parameters and consistent moment selection from high-dimensional moment conditions.Longitudinal data arise frequently in biomedical and genomic research where repeated measurements within subjects are correlated. It is important to select relevant covariates when the dimension of the parameters diverges as the sample size increases. We propose the penalized quadratic inference functionto perform model selection and estimation simultaneously in the framework of a diverging number of regression parameters.The penalized quadratic inference function can easily take correlation informationfrom clustered data into account, yet it does not require specifying the likelihood function. This is advantageous compared to existing model selection methods for discrete data with large cluster size.In addition, the proposed approach enjoys the oracle property; it is able to identify non-zero components consistently with probability tendingto 1, and any finite linear combination of the estimated non-zero components has an asymptotic normal distribution.We propose an efficient algorithm by selecting an effective tuning parameterto solve the penalizedquadratic inference function. Monte Carlo simulation studies have the proposed method selecting the correct model with a high frequency and estimating covariate effects accurately even when the dimension of parameters is high. We illustrate the proposed approach by analyzing periodontal disease data.The generalized method of moments (GMM) approach combinesmoment conditionsoptimally to obtain efficient estimation without specifying the full likelihood function. However, the GMM estimator could be infeasible when the number of moment conditions exceeds the sample size. This research intends to address issues arising from the motivating problem where the dimension of estimating equations or moment conditions far exceeds the sample size, such as in selecting informative correlation structure or modeling for dynamic panel data. We propose a Bayesian information type of criterion to select the optimal number oflinear combinations ofmoment conditions. In theory, we show that the proposed criterion leads to consistentselection ofthe number of principal components for the weighting matrix in the GMM. Monte Carlo studies indicate that the proposed method outperforms existing methods in the sense of reducing bias and improving the efficiency of estimation. We also illustrate a real data example for moment selection using dynamic panel data models.
【 预 览 】
附件列表
Files
Size
Format
View
Model selection for correlated data and moment selection from high-dimensional moment conditions