更新时间:2022-06-18 23:40:12
您可以使用 data.table
相对轻松地做到这一点:
You can do this relatively easily with data.table
:
vin.names <- vinDB[seq(1, nrow(vinDB), 2), ]
vin.vins <- vinDB[seq(2, nrow(vinDB), 2), ]
car.vins <- carFile[seq(2, nrow(carFile), 4), ]
library(data.table)
dt <- data.table(vin.names, vin.vins, key="vin.vins")
dt[J(car.vins), list(NumTimesFound=.N), by=vin.names]
# vin.names NumTimesFound
# 1: Ford 2014 15
# 2: Chrysler 1998 10
# 3: GM 1998 9
# 4: Ford 1998 11
# 5: Toyota 2000 12
# ---
# 75: Toyota 2007 7
# 76: Chrysler 1995 4
# 77: Toyota 2010 5
# 78: Toyota 2008 1
# 79: GM 1997 5
要理解的主要事情是使用 J(car.vins)
我们正在创建一个单列 data.table
与要匹配的 vins (J
只是 data.table
的简写,只要您在 data.table
中使用它).通过在 dt
中使用 data.table
,我们将 vins
列表加入到汽车列表中,因为我们键入了 dt
在上一步中通过vin.vins".最后一个参数告诉我们通过 vin.names
对连接的集合进行分组,中间的参数我们想知道每个组的实例数 .N
(.N
>.N 是一个特殊的 data.table
变量).
The main thing to understand is with J(car.vins)
we are creating a one column data.table
with the vins to match (J
is just shorthand for data.table
, so long as you use it within a data.table
). By using that data.table
inside dt
, we are joining the list of vins
to the list of cars because we keyed dt
by "vin.vins" in the prior step. The last argument tells us to group the joined set by vin.names
, and the middle argument that we want to know the number of instances .N
for each group (.N
is a special data.table
variable).
此外,我制作了一些垃圾数据来运行它.以后请提供这样的数据.
Also, I made some junk data to run this on. In the future, please provide data like this.
set.seed(1)
makes <- c("Toyota", "Ford", "GM", "Chrysler")
years <- 1995:2014
cars <- paste(sample(makes, 500, rep=T), sample(years, 500, rep=T))
vins <- unlist(replicate(500, paste0(sample(LETTERS, 16), collapse="")))
vinDB <- data.frame(c(cars, vins)[order(rep(1:500, 2))])
carFile <-
data.frame(
c(rep("junk", 1000), sample(vins, 1000, rep=T), rep("junk", 2000))[order(rep(1:1000, 4))]
)