| Austrian Journal of Statistics | |
| Fast Approximate Complete-data k-nearest-neighbor Estimation | |
| article | |
| Alejandro Murua1  Nicolas Wicker1  | |
| [1] University of Lille | |
| 关键词: big data; dimension estimation; distance distribution; hash table; high dimensionality; symmetric k-nearest-neighbors.; | |
| DOI : 10.17713/ajs.v49i2.907 | |
| 学科分类:医学(综合) | |
| 来源: Austrian Statistical Society | |
PDF
|
|
【 摘 要 】
We introduce a fast method to estimate the complete-data set of k-nearest-neighbors.This is equivalent to finding an estimate of the k-nearest-neighbor graph of the data. The method relies on random normal projections. The k-nearest-neighbors are estimated by sorting points in a number of random lines. For very large datasets, the method is quasi-linear in the data size. As an application, we show that the intrinsic dimension of a manifold can be reliably estimated from the estimated set of k-nearest-neighbors in time about two orders of magnitude faster than when using the exact set of k-nearest-neighbors.
【 授权许可】
CC BY
【 预 览 】
| Files | Size | Format | View |
|---|---|---|---|
| RO202105240000027ZK.pdf | 261KB |
PDF