Statistical Analysis and Data Mining | |
Distance‐based analysis of variance: Approximate inference | |
Christopher Minas1  Giovanni Montana1  | |
[1] Department of Mathematics Imperial College London London UK | |
关键词: distance‐; based inference; Person type III approximation; genomics; MANOVA; neuroimaging; | |
DOI : 10.1002/sam.11227 | |
学科分类:社会科学、人文和艺术(综合) | |
来源: John Wiley & Sons, Inc. | |
【 摘 要 】
In several modern applications, ranging from genomics to neuroimaging, there is a need to compare measurements across different populations, such as those collected from samples of healthy and diseased individuals. The interest is in detecting a group effect, and typically many thousands or even millions of tests need to be performed simultaneously, as exemplified in genomics where single tests are applied for each gene across the genome. Traditional procedures, such as multivariate analysis of variance (MANOVA), are not suitable when dealing with nonvector‐valued data structures such as functional or graph‐structured observations. In this article, we discuss an existing distance‐based MANOVA‐like approach, the distance‐based F (DBF) test, for detecting such differences. The null sampling distribution of the DBF test statistic relies on the distribution of the measurements and the chosen distance measure, and is generally unavailable in closed form. In practice, Monte Carlo permutation methods are deployed which introduce errors in estimating small p‐values and increase familywise type I error rates when not using enough permutations. In this work, we propose an approximate distribution for the DBF test allowing inferences to be drawn without the need for costly permutations. This is achieved by approximating the permutation distribution that would be obtained by enumerating all permutations by the Pearson type III distribution using moment matching. The use of the Pearson type III distribution is motivated via empirical observations with real data. We provide evidence with real and simulated data that the resulting approximate null distribution of the DBF test is flexible enough to work well with a range of distance measures. Through extensive simulations involving different data types and distance measures, we provide evidence that the proposed methodology yields the same statistical power that would otherwise only be achievable if many millions of Monte Carlo permutations were performed..
【 授权许可】
Unknown
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
RO201901231058669ZK.pdf | 33KB | download |