期刊论文详细信息
JOURNAL OF MULTIVARIATE ANALYSIS 卷:167
Probabilistic partial least squares model: Identifiability, estimation and application
Article
el Bouhaddani, Said1  Uh, Hae-Won1,2  Hayward, Caroline5  Jongbloed, Geurt3  Houwing-Duistermaat, Jeanine1,4 
[1] Leiden Univ, Med Ctr, Dept Med Stat & Bioinformat, Leiden, Netherlands
[2] Univ Med Ctr Utrecht, Div Julius Ctr, Dept Biostat & Res Support, Utrecht, Netherlands
[3] Delft Univ Technol, Dept Appl Math, Delft, Netherlands
[4] Univ Leeds, Dept Stat, Leeds, W Yorkshire, England
[5] Univ Edinburgh, Inst Genet & Mol Med, MRC Human Genet Unit, Edinburgh, Midlothian, Scotland
关键词: Dimension reduction;    EM algorithm;    Identifiability;    Inference;    Probabilistic partial least squares;   
DOI  :  10.1016/j.jmva.2018.05.009
来源: Elsevier
PDF
【 摘 要 】

With a rapid increase in volume and complexity of data sets, there is a need for methods that can extract useful information, for example the relationship between two data sets measured for the same persons. The Partial Least Squares (PLS) method can be used for this dimension reduction task. Within life sciences, results across studies are compared and combined. Therefore, parameters need to be identifiable, which is not the case for PLS. In addition, PLS is an algorithm, while epidemiological study designs are often outcome dependent and methods to analyze such data require a probabilistic formulation. Moreover, a probabilistic model provides a statistical framework for inference. To address these issues, we develop Probabilistic PLS (PPLS). We derive maximum likelihood estimators that satisfy the identifiability conditions by using an EM algorithm with a constrained optimization in the M step. We show that the PPLS parameters are identifiable up to sign. A simulation study is conducted to study the performance of PPLS compared to existing methods. The PPLS estimates performed well in various scenarios, even in high dimensions. Most notably, the estimates seem to be robust against departures from normality. To illustrate our method, we applied it to IgG glycan data from two cohorts. Our PPLS model provided insight as well as interpretable results across the two cohorts. (C) 2018 Elsevier Inc. All rights reserved.

【 授权许可】

Free   

【 预 览 】
附件列表
Files Size Format View
10_1016_j_jmva_2018_05_009.pdf 642KB PDF download
  文献评价指标  
  下载次数:2次 浏览次数:0次