更新时间:2023-02-18 14:57:16
我认为错误是因为传递给多个match_fun中的每一个的参数搞砸了,即无法传递诸如ignore_case
之类的额外参数,最初是为只是string_dist match_fun,变成>=
I think the error is because the arguments passed into each of the multiple match_fun's mess it up i.e. can't pass extra arguments like ignore_case
, originally intended for just the string_dist match_fun, into a match_fun of >=
解决方案是使用参数的固定参数定义我自己的match_fun.参见下文,其中我使用固定参数定义了自己的match_fun_stringdist.我还在另一个问题/答案 https://***.com/a/44383103/4663008 中实现了此操作.
The solution would be to define my own match_fun's with fixed parameters for arguments. See below where I define my own match_fun_stringdist with fixed parameters. I also implemented it here in another question/answer https://***.com/a/44383103/4663008.
# First, need to define match_fun_stringdist
# Code from stringdist_join from https://github.com/dgrtwo/fuzzyjoin
match_fun_stringdist <- function(v1, v2) {
# Can't pass these parameters in from fuzzy_join because of multiple incompatible match_funs, so I set them here.
ignore_case = FALSE
method = "dl"
max_dist = 99
distance_col = "dist"
if (ignore_case) {
v1 <- stringr::str_to_lower(v1)
v2 <- stringr::str_to_lower(v2)
}
# shortcut for Levenshtein-like methods: if the difference in
# string length is greater than the maximum string distance, the
# edit distance must be at least that large
# length is much faster to compute than string distance
if (method %in% c("osa", "lv", "dl")) {
length_diff <- abs(stringr::str_length(v1) - stringr::str_length(v2))
include <- length_diff <= max_dist
dists <- rep(NA, length(v1))
dists[include] <- stringdist::stringdist(v1[include], v2[include], method = method)
} else {
# have to compute them all
dists <- stringdist::stringdist(v1, v2, method = method)
}
ret <- dplyr::data_frame(include = (dists <= max_dist))
if (!is.null(distance_col)) {
ret[[distance_col]] <- dists
}
ret
}
并调用Fuzzy_join
and call fuzzy_join
fuzzy_join(data1, data2,
by = list(x = c("Address1", "AREACODE", "Year1"), y = c("Address2", "AREA_CODE", "Year2")),
match_fun = list(match_fun_stringdist, `==`, `<=`),
mode = "left")