且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

R将ID中重复的行与数据框中的不同列类型组合在一起

更新时间:2023-12-01 12:19:22

编辑

看到了您对非唯一因子列和按类型选择列的编辑。这将起作用,但是我会考虑一种更清洁的方法来进行报告并报告(我相信有一个简单的方法)。如果您想像原始示例一样手动指定列并且您具有非唯一因素,只需将 unlist() unique()$一起使用c $ c>以下面的相同方式。或者,您可以考虑使用 paste() collapse =; $ c>之类的一行结合所有因子水平这样。如果要更改最终数据表的列顺序,请在数据表上使用 setcolorder()

Just saw your edit about non-unique factor columns and selecting columns by type. This will work but I will think about a cleaner way to do this and report back (I am sure there is a simple way). If you want to manually specify columns like the original example and you have non-unique factors, just use unlist() with unique() in the same fashion as below. Alternatively, you could consider combining all factor levels on one line using paste() with collapse = "; " or something to that effect. If you want to change the column order for the final data.table, use setcolorder() on the data.table

setDT(df)

# For selecting columns later
num_cols <- sapply(df, is.numeric)
num_cols[names(num_cols) == "id"] <- FALSE
fac_cols <- sapply(df, is.factor)

df[, lapply(.SD, mean, na.rm = T), by = id, .SDcols = num_cols][
  df[, lapply(.SD, function(i) unlist(unique(i[!is.na(i)]))), by = id, .SDcols = fac_cols], on = "id"]

   id abst gier farbe
1:  1    1  2.5 keine
2:  2    0  0.0 keine
3:  3    0  0.0 keine
4:  4    3  3.0  rot2
5:  4    3  3.0   rot

工作原理
它加入数字列摘要

How it works: It joins the numeric column summary

df[, lapply(.SD, mean, na.rm = TRUE), by = id, .SDcols = num_cols]

具有因子列摘要

df[, lapply(.SD, function(i) unlist(unique(i[!is.na(i)]))), by = id, .SDcols = fac_cols]

要编辑的数据

df <- data.frame(id    = c(1, 1, 1, 1, 1, 2, 2, 3, 3, 4, 4, 4, 4),
                 abst  = c(0, NA, 2, NA, NA, NA, 0, 0, NA, 2, NA, 3, 4),
                 farbe = as.factor(c("keine", NA, "keine", NA, NA, NA, "keine", "keine", NA, NA, "rot2", "rot", "rot")),
                 gier  = c(0, NA, 5, NA, NA, NA, 0, 0, NA, 1, NA, 6, 2))

原始答案

这是许多 data.table 解决方案之一。这按因子列对data.table进行排序,以便在汇总时可以获取最高值。我也将其转换回纯data.frame,但如果您不想这样做,则不必这样做。希望这对您有帮助!

Here is one of many data.table solutions. This orders the data.table by the factor column so it can grab the top value while summarizing. I also converted it back to a pure data.frame but you do not have to do that if you do not want to. Hope this helps!

此外,这还假定农场对于每个都是相同的id

Also, this assumes that farbe will be the same for each id

library(data.table)

setDT(df)

df <- df[order(farbe), .(abst = mean(abst, na.rm = TRUE),
                         farbe = farbe[1],
                         gier = mean(gier, na.rm = TRUE)), by = id]

setDF(df)
df
  id abst farbe gier
1  1    1 keine  2.5
2  2    0 keine  0.0
3  3    0 keine  0.0
4  4    3   rot  3.0