Model selection can be an art. In many scientific fields including genetics, climate sciences and social sciences, a proper model can simplify the computation in the analysis as well as increase the interpretability of the result. In this dissertation, we articulate two methods, under two different guiding principles, to facilitate model selection.The first method concerns the computational challenge in Bayesian model comparison. We show that the Bayes factor can be approximated using the Wang-Landau algorithm, based on a mixture formulation between the posterior distribution and a user-defined surrogate distribution. The proposed Wang-Landau mixture method is applicable as long as an effective Markov kernel invariant to the posterior is available. Further refinements are carefully discussed, including accelerating the convergence using the momentum method, and facilitating global jumps between the posterior and the surrogate using the Multiple-try Metropolis.The second method concerns a desired Frequentist property in feature selection. Specifically, we form a proper statistic via data splitting to rank the importance of each feature. The statistic enjoys a useful property, that is, it is symmetric about 0 for null features, and relatively large for relevant features. We show that by carefully choosing a data-dependent cutoff, we can achieve asymptotic false discovery rate control under proper conditions. The proposed method is free of calculating p-values, and is applicable to a wide class of statistical models including the linear model, the generalized linear model, and the Gaussian graphical model.
【 预 览 】
附件列表
Files
Size
Format
View
Methods on Model Selection: Bayes Factor Approximation and False Discovery Rate Control