且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

如何在r中按最接近的距离合并两个数据集?

更新时间:2022-06-17 23:50:53

如果您想在R中进行 hack ,则可以使用R的outer函数(并且意识到R擅长矢量化)可以有效地产生所有A[, c(x,y)]中所有元素的A[, c(x,y)],即获取A(行)中的位置与B(列)中的每个位置(例如

If you want a hack in R, you can use R's outer-function (and the awareness that R is good at vectorization) to efficiently produce the distances of all in A[, c(x,y)] from all in B[, c(x,y)], that is, obtaining a Matrix of distances of the locations in A (row) from each of the locations in B (columns) e.g.,

A<- read.table(header = TRUE, text = "
               Name x y 
               city 50.3 4.2
               farm 14.8 8.6
               lake 18.7 9.8
               mountain 44 9.8")
B<- read.table(header = TRUE, text = "
               Temp x y 
               18 50.7 6.2
               17.3 20 11
               15 15 9
               18 ")
d <- sqrt(outer(A$x, B$x, "-")^2 + outer(A$y, B$y, "-")^2)
d

##          [,1]      [,2]       [,3]
## [1,]  2.039608 31.053663 35.6248509
## [2,] 35.980133  5.727128  0.4472136
## [3,] 32.201863  1.769181  3.7854986
## [4,]  7.605919 24.029981 29.0110324

接下来,您可以有效地通过matrixStats包

Next you can efficiently obtain its value via the rowMins-method in matrixStats package

minD <- matrixStats::rowMins(d)

并假设B中有一个唯一的最近位置,可通过dminD

And assuming there is a unique closest location in B obtain its index via (row-wise) comparison of d to minD

ind <- (d == minD) %*% 1:ncol(d)

如果B中有多个等距的位置,则无论如何您都需要某种选择规则. 最后,只需将数据堆叠在一起.

If there are multiple equaly distanced locations in B you'll anyways need some kind of rule as to which to choose. Last, just stack the data together.

C <- data.frame(Name = A$Name,
                Temp = B$Temp[ind],
                Distance = minD)