更新时间:2023-12-01 11:35:46
尝试:
dt [,c(date_new = paste (date,collapse =;),。SD),by = id]
I have a large data table of patient data. I want to delete rows where "id" is duplicated without losing the information in the "date" column.
id date
01 2004-07-01
02 NA
03 2013-11-15
03 2005-03-15
04 NA
05 2011-07-01
05 2012-07-01
I could do this one of two ways -
create a column that writes over the date column values to concatenate all the dates for that ID, i.e.:
id date_new
01 2004-07-01
02 NA
03 2013-11-15; 2005-03-15
04 NA
05 2011-07-01; 2012-07-01
or
create one new column for each additional date, i.e.:
id date_new date_new2
01 2004-07-01 NA
02 NA NA
03 2013-11-15 2005-03-15
04 NA NA
05 2011-07-01 2012-07-01
I have tried a few things, but they keep crashing my R session (I get the message R Session Aborted. R encountered a fatal error. The session was terminated.
):
setkey(DT, "id")
unique_DT <- subset(unique(DT))
and:
DT[!duplicated(DT[, "id", with = FALSE])]
However, besides crashing R, neither of these solutions does what I want with the dates.
Any ideas? I am new to data table (and R generally) but I have the vague sense that I could solve this with :=
somehow.
Try this:
dt[,c(date_new=paste(date,collapse="; "),.SD),by=id]