学位论文详细信息
Matching and Inference for Multiple Correlated Data Sets
Dimension reduction;Machine learning;Data matching;Statistical inference;Applied Mathematics & Statistics
Shen, CenchengFishkind, Donniell E. ;
Johns Hopkins University
关键词: Dimension reduction;    Machine learning;    Data matching;    Statistical inference;    Applied Mathematics & Statistics;   
Others  :  https://jscholarship.library.jhu.edu/bitstream/handle/1774.2/39366/SHEN-DISSERTATION-2015.pdf?sequence=1&isAllowed=y
瑞士|英语
来源: JOHNS HOPKINS DSpace Repository
PDF
【 摘 要 】

Given multiple correlated data sets, an important question is how to make use of them to benefit later statistical inference. This is a realistic setting in the modern world as more and more related data sets are collected, say images and their descriptions, articles in multiple languages, actors in multiple social networks; and real data are often multivariate or high-dimensional such that dimension reduction is necessary before any inference. In this dissertation, I consider three dimension reduction and matching methods, namely principal component analysis followed by Procrustes matching, canonical correlation analysis, and nonlinear matching using shortest-path distance and joint neighborhood. I investigate their theoretical properties and their impact on later inference using the Procrustes fitting error, classification error, and hypothesis testing respectively. The main conclusion of this dissertation is that given a particular inference task for multiple correlated data sets, we may significantly improve the inference performance by joint matching and projection, compared to separate projection or omitting modalities. Numerical experiments are provided to illustrate the theorems and the methodology using simulated data and real data.

【 预 览 】
附件列表
Files Size Format View
Matching and Inference for Multiple Correlated Data Sets 1215KB PDF download
  文献评价指标  
  下载次数:11次 浏览次数:15次