且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

R数据表通过组替换所有缺少的列的第一行

更新时间:2023-01-21 18:10:13

p>这似乎有效:

unique( d_sample[order(is.na(Event))], by="ID" )

   ID Time Event
1:  2  110     1
2:  3  200     1
3:  1   10    NA

或者, d_sample [order(is.na(Event)),.SD [1L],by = ID] / code>。

Alternately, d_sample[order(is.na(Event)), .SD[1L], by=ID].

扩展OP的示例,我也发现两种方法的类似时间: p>

Extending the OP's example, I also find similar timings for the two approaches:

n = 12e4 # must be a multiple of 6
set.seed(1)
d_sample = data.table( ID = sort(rep(seq(1,n/2), 2)), 
                   Time = rep(c(10, 15, 100, 110, 200, 220), n/6), 
                   Event = rep(c(NA, NA, NA, 1, 1, NA), n/6) )

system.time(rf <- unique( d_sample[order(is.na(Event))], by="ID" ))
# 1.17
system.time(rf2 <- d_sample[order(is.na(Event)), .SD[1L], by=ID] )   
# 1.24
system.time(rt <- d_sample[, if(all(is.na(Event))) .SD[1] else .SD[!is.na(Event)], by=ID])    
# 10.42
system.time(rt2 <- 
    d_sample[ d_sample[, { w = which(is.na(Event)); .I[ if (length(w) == .N) 1L else -w ] }, by=ID]$V1 ] 
)
# .13

# verify
identical(rf,rf2) # TRUE
identical(rf,rt) # FALSE
fsetequal(rf,rt) # TRUE
identical(rt,rt2) # TRUE

@ thelatemail解决方案的变体 rt2

The variation on @thelatemail's solution rt2 is the fastest by a wide margin.