且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

通过R data.table中的ID删除重复的行,但从另一列添加具有连接日期的新列

更新时间:2023-12-01 11:35:46

尝试:

  dt [,c(date_new = paste (date,collapse =;),。SD),by = id] 


I have a large data table of patient data. I want to delete rows where "id" is duplicated without losing the information in the "date" column.

id  date
01  2004-07-01
02  NA
03  2013-11-15
03  2005-03-15
04  NA
05  2011-07-01
05  2012-07-01

I could do this one of two ways -

  1. create a column that writes over the date column values to concatenate all the dates for that ID, i.e.:

    id  date_new
    01  2004-07-01
    02  NA
    03  2013-11-15; 2005-03-15
    04  NA
    05  2011-07-01; 2012-07-01
    

or

  1. create one new column for each additional date, i.e.:

    id  date_new    date_new2
    01  2004-07-01  NA
    02  NA          NA
    03  2013-11-15  2005-03-15
    04  NA          NA
    05  2011-07-01  2012-07-01
    

I have tried a few things, but they keep crashing my R session (I get the message R Session Aborted. R encountered a fatal error. The session was terminated.):

setkey(DT, "id")
unique_DT <- subset(unique(DT))

and:

DT[!duplicated(DT[, "id", with = FALSE])]

However, besides crashing R, neither of these solutions does what I want with the dates.

Any ideas? I am new to data table (and R generally) but I have the vague sense that I could solve this with := somehow.

Try this:

dt[,c(date_new=paste(date,collapse="; "),.SD),by=id]