学位论文详细信息
Informative Data Fusion: Beyond Canonical Correlation Analysis
Correlation analysis;Random matrix theory;Electrical Engineering;Engineering;Electrical Engineering: Systems
Asendorf, Nicholas A.Hero Iii, Alfred O. ;
University of Michigan
关键词: Correlation analysis;    Random matrix theory;    Electrical Engineering;    Engineering;    Electrical Engineering: Systems;   
Others  :  https://deepblue.lib.umich.edu/bitstream/handle/2027.42/113419/asendorf_1.pdf?sequence=1&isAllowed=y
瑞士|英语
来源: The Illinois Digital Environment for Access to Learning and Scholarship
PDF
【 摘 要 】

Multi-modal data fusion is a challenging but common problem arising in fields such as economics, statistical signal processing, medical imaging, and machine learning. In such applications, we have access to multiple datasets that use different data modalities to describe some system feature. Canonical correlation analysis (CCA) is a multidimensional joint dimensionality reduction algorithm for exactly two datasets. CCA finds a linear transformation for each feature vector set such that the correlation between the two transformed feature sets is maximized. These linear transformations are easily found by solving the SVD of a matrix that only involves the covariance and cross-covariance matrices of the feature vector sets. When these covariance matricesare unknown, an empirical version of CCA substitutes sample covariance estimates formed from training data. However, when the number of training samples is less than the combined dimension of the datasets, CCA fails to reliably detect correlation between the datasets.This thesis explores the the problem of detecting correlations from data modeled by the ubiquitous signal-plus noise data model. We present a modification to CCA, which we call informative CCA (ICCA) that first projects each dataset onto a low-dimensional informative signal subspace. We verify the superior performance of ICCA on real-world datasets and argue the optimality of trim-then-fuse over fuse-then-trim correlation analysis strategies. We provide a significance test for the correlations returned by ICCA and derive improved estimates of the population canonical vectors using insights from random matrix theory. We then extend the analysis of CCA to regularized CCA (RCCA) and demonstrate that setting the regularization parameter to infinity results in the best performance and has the same solution as taking the SVD of the cross-covariance matrix of the two datasets. Finally, we apply the ideas learned from ICCA to multiset CCA (MCCA), which analyzes correlations for more than two datasets. There are multiple formulations of multiset CCA (MCCA), each using a different combination of objective function and constraint function to describe a notion of multiset correlation. We consider MAXVAR, provide an informative version of the algorithm, which we call informative MCCA (IMCCA), and demonstrate its superiority on a real-world dataset.

【 预 览 】
附件列表
Files Size Format View
Informative Data Fusion: Beyond Canonical Correlation Analysis 20997KB PDF download
  文献评价指标  
  下载次数:24次 浏览次数:46次