更新时间:2023-02-26 10:46:12
我认为 data.table 的理念需要的任务的特殊命名函数比你在 tidyverse 中找到的要少,所以需要一些额外的编码,喜欢:
I reckon that the philosophy of data.table entails fewer specially-named functions for tasks than you'll find in the tidyverse, so some extra coding is required, like:
res = setDT(df)[
CJ(person = person, observation_id = observation_id, unique=TRUE),
on=.(person, observation_id)
]
在此之后,您仍然需要手动处理缺失级别的值的填充.我们可以使用 setnafill
来有效地处理这个 &data.table
最新版本中的引用:
After this, you still have to manually handle the filling of values for missing levels. We can use setnafill
to handle this efficiently & by-reference in recent versions of data.table
:
setnafill(res, fill = 0, cols = 'value')
请参阅 @Jealie 的回答,了解可以避开此问题的功能.
See @Jealie's answer regarding a feature that will sidestep this.
当然,这里的列名必须输入三次,这很疯狂.但另一方面,可以编写一个包装器:
Certainly, it's crazy that the column names have to be entered three times here. But on the other hand, one can write a wrapper:
completeDT <- function(DT, cols, defs = NULL){
mDT = do.call(CJ, c(DT[, ..cols], list(unique=TRUE)))
res = DT[mDT, on=names(mDT)]
if (length(defs))
res[, names(defs) := Map(replace, .SD, lapply(.SD, is.na), defs), .SDcols=names(defs)]
res[]
}
completeDT(setDT(df), cols = c("person", "observation_id"), defs = c(value = 0))
person observation_id value
1: 1 1 1
2: 1 2 0
3: 2 1 1
4: 2 2 1
作为避免在第一步输入三次名称的快速方法,这是@thelatemail 的想法:
As a quick way of avoiding typing the names three times for the first step, here's @thelatemail's idea:
vars <- c("person","observation_id")
df[do.call(CJ, c(mget(vars), unique=TRUE)), on=vars]
# or with magrittr...
c("person","observation_id") %>% df[do.call(CJ, c(mget(.), unique=TRUE)), on=.]
更新:现在您无需在 CJ 中输入两次姓名,这要感谢@MichaelChirico &@MattDowle 用于改进.
Update: now you don't need to enter names twice in CJ thanks to @MichaelChirico & @MattDowle for the improvement.