且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

R:子集上的数据表,不包括值

更新时间:2023-12-01 17:59:10

@ Roland的答案将适用于某些功能***),但不是一般。不幸的是,您不能将分割 - 应用 - 组合策略应用于数据,就像执行任务一样,但如果您使数据更大,您可以。让我们从一个更简单的例子开始:

@Roland's answer will work for some functions (and when it does it will be best) but not in general. Unfortunately you can't apply the split-apply-combine strategy to the data as is to do the task, but you can if you make the data larger. Let's start with a simpler example:

dt = data.table(a = c(1,1,2,2,3,3), b = c(1:6), key = 'a')

# now let's extend this table the following way
# take the unique a's and construct all the combinations excluding one element
combinations = dt[, combn(unique(a), 2)]

# now combine this into a data.table with the excluded element as the index
# and merge it back into the original data.table
extension = rbindlist(apply(combinations, 2,
                  function(x) data.table(a = x, index = setdiff(c(1,2,3), x))))
setkey(extension, a)

dt.extended = extension[dt, allow.cartesian = TRUE]
dt.extended[order(index)]
#    a index b
# 1: 2     1 3
# 2: 2     1 4
# 3: 3     1 5
# 4: 3     1 6
# 5: 1     2 1
# 6: 1     2 2
# 7: 3     2 5
# 8: 3     2 6
# 9: 1     3 1
#10: 1     3 2
#11: 2     3 3
#12: 2     3 4

# Now we have everything we need:
dt.extended[, mean(b), by = list(a = index)]
#   a  V1
#1: 3 2.5
#2: 2 3.5
#3: 1 4.5

返回原始数据(并做一些操作略有不同, ):

Going back to original data (and doing some operations slightly differently, to simplify expressions):

extension = d[, {Carrier.uniq = unique(Carrier);
                 .SD[, rbindlist(combn(Carrier.uniq, length(Carrier.uniq)-1,
                          function(x) data.table(Carrier = x,
                                   index = setdiff(Carrier.uniq, x)),
                          simplify = FALSE))]}, by = Market]
setkey(extension, Market, Carrier)

extension[d, allow.cartesian = TRUE][, mean(Stops), by = list(Market, Carrier = index)]
#    Market   Carrier       V1
#1: IAH:SNA Southwest 1.000000
#2: IAH:SNA     Delta 3.000000
#3: MSP:CLE   JetBlue 2.000000
#4: MSP:CLE Southwest 1.500000
#5: MSP:CLE  American 1.666667