且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

基于重复行条件的R - 子集列

更新时间:2023-02-02 22:19:42

预期输出不是很清楚。可以这样帮助:

  indx<  - 与(DF,ave(!Site_count,ID,FUN = function(x) sum(x)> 1))
DF [!(重复(DF $ ID)& indx),]



更新



重新阅读说明后,您的预期答案也可能是:



(DF,ave(Site_count,ID,FUN = function(x)any(x> 0)))
DF [!(duplicate DF $ ID)& indx),]


I have a dataframe with an id column that is repeated, with site counts. I want to know how I can remove the duplicates ID records only when Site_Count record is more than 0.

Generate DF:

DF <- data.frame(
    'ID' = sample(100:300, 100, replace=T),
    'Site_count' = sample(0:1, 100, replace=T)
)

My attempt at the subset:

subset(DF[!duplicated(DF$ID),], site_count > 0)

But in this case it will remove all 0 site counts - I want to subset to only remove the record when there is a duplicate record with more than 0 site count.

Desirable results would look something like this (notice there site IDs with 0 site count, but no duplicate IDs with 0 and another site count):

ID    site count
--    ----------
1        0
2        1
3        1
4        0
5        5

The expected output is not very clear. May be this helps:

 indx <- with(DF, ave(!Site_count, ID, FUN=function(x) sum(x)>1))
 DF[!(duplicated(DF$ID) & indx),]

Update

After re-reading the description, your expected answer could also be:

 indx <- with(DF, ave(Site_count, ID, FUN=function(x) any(x>0)))
 DF[!(duplicated(DF$ID) & indx),]