The purpose of this thesis is to give insight into major problems arising in the theory of mixture distributions, and, more importantly, to improve and extend some of the results that are given in the literature. We especially focus on the problem of estimating the number of components that underlie the probability distribution of a data sample. This is among the most difficult problems encountered in this area. In principle, there are two main approaches to the problem; the theoretical approach studies the asymptotic distribution of the likelihood ratio test under the null hypothesis for testing for k1 versus k2 components, where k1 < k2, and the algorithmic approach uses simulations in order to overcome some of the theoretical difficulties. In this work, we use both methodologies, and we give illustrations of the methods' performances in some practical examples. We emphasise, now, the approaches that we adopt for dealing with this problem. In a Monte-Carlo context, we propose a technique that uses an information theory criterion inside a parametric bootstrap procedure. The performance of this technique is then assessed, and comparison is made to a method using a similar type of bootstrap procedure, but where the decision criterion is based on likelihood ratio inference, and to Windham and Cutler's (1992) information theory based method. Another combined approach is also suggested. Using a stochastic algorithmic methodology, Celeux (1987) proposes a test for the number of components, and claims that it follows a Hotelling's distribution. We argue about the reasons why this does not hold, and we derive the asymptotic distribution of this test statistic. This theoretical investigation leads us to study some problems that go beyond the scope of the mixture framework, since they are related to the theory of autoregressive processes. Some simulation results are also provided in some simple situations. In the case where the mixing proportions are known, there is a result in the literature (Goffinet et al, 1992) that provides the asymptotic distribution of the likelihood ratio test under the null hypothesis. However, this result is not useful in some cases, corresponding to some values of the proportions. Using, then, theoretical arguments supported by simulation results, we provide this distribution in those cases. Thus, in summary, there are three main directions in this thesis: the information based approach that mainly uses computational tools arising from recent developments in the theory of the EM algorithm; the stochastic approach that uses mathematical tools from the theory of stochastic processes; and the study of the likelihood ratio test for known proportions, whose general techniques arise from the theory of asymptotic statistics.
【 预 览 】
附件列表
Files
Size
Format
View
Aspects of the Statistical Analysis of Data From Mixture Distributions