| JOURNAL OF MULTIVARIATE ANALYSIS | 卷:116 |
| Intrinsic dimension identification via graph-theoretic methods | |
| Article | |
| Brito, M. R.1  Quiroz, A. J.2,3  Yukich, J. E.4  | |
| [1] Univ Simon Bolivar, Dpto Matemat Puras & Aplicadas, Caracas 1080, Venezuela | |
| [2] Univ Simon Bolivar, Dpto Computo Cientif Estadist, Caracas 1080, Venezuela | |
| [3] Univ Los Andes, Dpto Matemat, Bogota, Colombia | |
| [4] Lehigh Univ, Dept Math, Bethlehem, PA USA | |
| 关键词: Intrinsic dimension; Graph theoretical methods; Stabilization methods; Dimensionality reduction; | |
| DOI : 10.1016/j.jmva.2012.12.007 | |
| 来源: Elsevier | |
PDF
|
|
【 摘 要 】
Three graph theoretical statistics are considered of the problem of estimating the intrinsic dimension of a data set. The first is the reach statistic, (r) over bar (j,k), proposed in Brito et al. (2002) [4] for the problem of identification of Euclidean dimension. The second, M-n is the sample average of squared degrees in the minimum spanning tree of the data, while the third statistic, U-n(k), is based on counting the number of common neighbors among the knearest, for each pair of sample points {X-i, X-j}, i < j <= n. For the first and third of these statistics, central limit theorems are proved under general assumptions, for data living in an m-dimensional C-1 submanifold of R-d, and in this setting, we establish the consistency of intrinsic dimension identification procedures based on <(r)over bar>(j,k) and U-n(k). For M-n asymptotic results are provided whenever data live in an affine subspace of Euclidean space. The graph theoretical methods proposed are compared, via simulations, with a host of recently proposed nearest neighbor alternatives. (C) 2013 Elsevier Inc. All rights reserved.
【 授权许可】
Free
【 预 览 】
| Files | Size | Format | View |
|---|---|---|---|
| 10_1016_j_jmva_2012_12_007.pdf | 483KB |
PDF