如何在高维数据中高效地找到k-最近邻?

更新时间：2023-11-27 23:17:34

关于 kd-trees 的***文章有一个链接到 ANN 库:

The Wikipedia article for kd-trees has a link to the ANN library:

ANN 是一个用 C++ 编写的库，它支持数据结构和精确算法和近似最近邻搜索任意高的维度.

ANN is a library written in C++, which supports data structures and algorithms for both exact and approximate nearest neighbor searching in arbitrarily high dimensions.

根据我们自己的经验，ANN对点执行非常有效大小从数千到数十万，并在尺寸高达 20.(对于显着更高的应用程序维度，结果相当参差不齐，但你还是可以试试.)

Based on our own experience, ANN performs quite efficiently for point sets ranging in size from thousands to hundreds of thousands, and in dimensions as high as 20. (For applications in significantly higher dimensions, the results are rather spotty, but you might try it anyway.)

就算法/数据结构而言:

As far as algorithm/data structures are concerned:

该库实现了许多不同的数据结构，基于kd-trees 和盒分解树，并雇用了几个不同的搜索策略.

The library implements a number of different data structures, based on kd-trees and box-decomposition trees, and employs a couple of different search strategies.

我会先直接尝试，如果这不能产生令人满意的结果，我会在应用 PCA/ICA 后将其与数据集一起使用(因为您最终不太可能获得足够多的 kd 维度)-要处理的树).

I'd try it first directly and if that doesn't produce satisfactory results I'd use it with the data set after applying PCA/ICA (since it's quite unlikely your going to end up with few enough dimensions for a kd-tree to handle).

上一篇 : ：如何在嵌入式Linux中高效地在VFAT分区上创建大文件下一篇 : 创建触发器错误:语法无效

如何在高维数据中高效地找到k-最近邻?

相关阅读

推荐文章