This thesis concerns the problem of dimensionality reduction through information geometric methods on statistical manifolds. While there has been considerable work recently presented regarding dimensionality reduction for the purposes of learning tasks such as classification, clustering, and visualization, these methods have focused primarily on Riemannian sub-manifolds in Euclidean space. While sufficient for many applications, there are many high-dimensional signals which have no straightforward and meaningful Euclidean representation. In these cases, signals may be more appropriately represented as a realization of some distribution lying on a statistical manifold, or a manifold of probability density functions (PDFs). These manifolds are often intrinsically lower dimensional than the domain of the data realization.We begin by first discussing local intrinsic dimension estimation and its applications. There has been much work done on estimating the global dimension of a data set, typically for the purposes of dimensionality reduction. We show that by estimating dimension locally, we are able to extend the uses of dimension estimation tostatistical manifolds as well as many applications which are not possible with global dimension estimation. We illustrate independent benefits of dimension estimation on complex problems such as anomaly detection, clustering, and image segmentation.We then discuss two methods of dimensionality reduction on statistical manifolds. First, we propose a method for statistical manifold reconstruction that utilizes the principals of information geometry and Euclidean manifold learning to embed PDFs into a low-dimensional Euclidean space. This embedding enables comparative analysis of multiple high-dimensional data sets using standard Euclidean methods. Oursecond algorithm proposes a linear projection method which creates a dimension reduced subspace which preserves the high-dimensional relationships between multiple signals. Defining this information preserving projection contributes to both feature extraction and visualization of high-dimensional data.Finally, we illustrate these techniques toward their original motivating problem of clinical flow cytometric analysis. These methods of dimensionality reduction approach the problems of diagnosis, visualization, and verification of flow cytometric data in a manner which has not been given significant consideration in the past. The tools we propose are illustrated for several case studies on actual patient data sets.
【 预 览 】
附件列表
Files
Size
Format
View
Dimensionality Reduction on Statistical Manifolds.