通过R data.table中的ID删除重复的行，但从另一列添加具有连接日期的新列

更新时间：2023-12-01 11:35:46

尝试：

  dt [，c（date_new = paste （date，collapse =;），。SD），by = id]

I have a large data table of patient data. I want to delete rows where "id" is duplicated without losing the information in the "date" column.

id  date
01  2004-07-01
02  NA
03  2013-11-15
03  2005-03-15
04  NA
05  2011-07-01
05  2012-07-01

I could do this one of two ways -

create a column that writes over the date column values to concatenate all the dates for that ID, i.e.:
```
id  date_new
01  2004-07-01
02  NA
03  2013-11-15; 2005-03-15
04  NA
05  2011-07-01; 2012-07-01
```

create one new column for each additional date, i.e.:

id  date_new    date_new2
01  2004-07-01  NA
02  NA          NA
03  2013-11-15  2005-03-15
04  NA          NA
05  2011-07-01  2012-07-01

I have tried a few things, but they keep crashing my R session (I get the message R Session Aborted. R encountered a fatal error. The session was terminated.):

setkey(DT, "id")
unique_DT <- subset(unique(DT))

and:

DT[!duplicated(DT[, "id", with = FALSE])]

However, besides crashing R, neither of these solutions does what I want with the dates.

Any ideas? I am new to data table (and R generally) but I have the vague sense that I could solve this with := somehow.

Try this:

dt[,c(date_new=paste(date,collapse="; "),.SD),by=id]

上一篇 : ：如何查找重复的患者并添加新列下一篇 : 替换 R 中数据框列中的数字?

通过R data.table中的ID删除重复的行，但从另一列添加具有连接日期的新列

相关阅读

推荐文章