根据 R 中的条件删除数据框的列

更新时间：2023-11-18 23:01:16

我觉得这一切都过于复杂了.条件 2 已经包含了所有其余的条件，好像一列中至少有两个非 NA 值，显然整列都不是 NA.如果一列中至少有两个连续的值，那么显然该列包含多个值.因此，这不是 3 个条件，而是全部汇总为一个条件(我不希望每列运行许多函数，而是在每列运行 diff 之后 - 对整个事物进行矢量化):

I feel like this is all over-complicated. Condition 2 already includes all the rest of the conditions, as if there are at least two non-NA values in a column, obviously the whole column aren't NAs. And if there are at least two consecutive values in a column, then obviously this column contains more than one value. So instead of 3 conditions, this all sums up into a single condition (I prefer not to run many functions per column, rather after running diff per column- vecotrize the whole thing):

cond <- colSums(is.na(sapply(df, diff))) < nrow(df) - 1

这是可行的，因为如果一列中没有连续的值，则整列将变为 NA.

This works because if there are no consecutive values in a column, the whole column will become NAs.

那么，就

df[, cond, drop = FALSE]
#        A     E
# 1  0.018    NA
# 2  0.017    NA
# 3  0.019    NA
# 4  0.018    NA
# 5  0.018    NA
# 6  0.015 0.037
# 7  0.016 0.031
# 8  0.019 0.025
# 9  0.016 0.035
# 10 0.018 0.035
# 11 0.017 0.043
# 12 0.023 0.040
# 13 0.022 0.042

根据您的编辑，您似乎有一个 data.table 对象，并且您还有一个 Date 列，因此代码需要一些修改.

Per your edit, it seems like you have a data.table object and you also have a Date column so the code would need some modifications.

cond <- df[, lapply(.SD, function(x) sum(is.na(diff(x)))) < .N - 1, .SDcols = -1] 
df[, c(TRUE, cond), with = FALSE]

一些解释:

我们想忽略计算中的第一列，因此在对 .SD 进行操作时指定 .SDcols = -1(这意味着 Sub Data in data.tableis)
.N 只是行数(类似于 nrow(df)
下一步是按条件进行子集化.我们也不必忘记抓取第一列，所以我们从 c(TRUE,...
最后，data.table 默认使用非标准评估，因此，如果您想像在 data.frame 中一样选择列，则需要指定 with = FALSE

We want to ignore the first column in our calculations so we specify .SDcols = -1 when operating on our .SD (which means Sub Data in data.tableis)
.N is just the rows count (similar to nrow(df)
Next step is to subset by condition. We need not forget to grab the first column too so we start with c(TRUE,...
Finally, data.table works with non standard evaluation by default, hence, if you want to select column as if you would in a data.frame you will need to specify with = FALSE

不过，更好的方法是使用 := NULL

A better way though, would be just to remove the column by reference using := NULL

cond <- c(FALSE, df[, lapply(.SD, function(x) sum(is.na(diff(x)))) == .N - 1, .SDcols = -1])
df[, which(cond) := NULL]

上一篇 : ：基于多列和多行条件扩展 R 数据框下一篇 : 错误：ld：找不到带有CocoaPods的-lPods的库

根据 R 中的条件删除数据框的列

相关阅读

推荐文章