且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

如何在高维数据中高效地找到k-最近邻?

更新时间:2023-11-27 23:17:34

关于 kd-trees 的***文章有一个链接到 ANN 库:

The Wikipedia article for kd-trees has a link to the ANN library:

ANN 是一个用 C++ 编写的库,它支持数据结构和精确算法和近似最近邻搜索任意高的维度.

ANN is a library written in C++, which supports data structures and algorithms for both exact and approximate nearest neighbor searching in arbitrarily high dimensions.

根据我们自己的经验,ANN对点执行非常有效大小从数千到数十万,并在尺寸高达 20.(对于显着更高的应用程序维度,结果相当参差不齐,但你还是可以试试.)

Based on our own experience, ANN performs quite efficiently for point sets ranging in size from thousands to hundreds of thousands, and in dimensions as high as 20. (For applications in significantly higher dimensions, the results are rather spotty, but you might try it anyway.)

就算法/数据结构而言:

As far as algorithm/data structures are concerned:

该库实现了许多不同的数据结构,基于kd-trees 和 盒分解树,并雇用了几个不同的搜索策略.

The library implements a number of different data structures, based on kd-trees and box-decomposition trees, and employs a couple of different search strategies.

我会先直接尝试,如果这不能产生令人满意的结果,我会在应用 PCA/ICA 后将其与数据集一起使用(因为您最终不太可能获得足够多的 kd 维度)-要处理的树).

I'd try it first directly and if that doesn't produce satisfactory results I'd use it with the data set after applying PCA/ICA (since it's quite unlikely your going to end up with few enough dimensions for a kd-tree to handle).