且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

pandas groupby和filter

更新时间:2023-12-04 11:10:04

我认为 groupby 是不必要的,请使用 布尔索引 仅在需要所有 V 0

I think groupby is not necessary, use boolean indexing only if need all rows where V is 0:

print (df[df.V == 0])
    C  ID  V  YEAR
0   0   1  0  2011
3  33   2  0  2013
5  55   3  0  2014

但是如果需要返回所有组,其中列 V 的至少一个值等于 0 添加任何,因为过滤器需要 True False 用于过滤组中的所有行:

But if need return all groups where is at least one value of column V equal 0 add any, because filter need True or False for filtering all rows in group:

print(df.groupby(['ID']).filter(lambda x: (x['V'] == 0).any())) 
    C  ID  V  YEAR
0   0   1  0  2011
1  11   1  1  2012
2  22   2  1  2012
3  33   2  0  2013
4  44   3  1  2013
5  55   3  0  2014

更好的测试方法是 groupby 的更改列- 2012 被过滤掉,因为没有 V == 0

Better for testing is change column for groupby - row with 2012 is filter out because no V==0:

print(df.groupby(['YEAR']).filter(lambda x: (x['V'] == 0).any())) 
    C  ID  V  YEAR
0   0   1  0  2011
3  33   2  0  2013
4  44   3  1  2013
5  55   3  0  2014

如果性能很重要,请使用 GroupBy.transform 布尔值索引

If performance is important use GroupBy.transform with boolean indexing:

print(df[(df['V'] == 0).groupby(df['YEAR']).transform('any')]) 
   ID  YEAR  V   C
0   1  2011  0   0
3   2  2013  0  33
4   3  2013  1  44
5   3  2014  0  55

详细信息

print((df['V'] == 0).groupby(df['YEAR']).transform('any')) 
0     True
1    False
2    False
3     True
4     True
5     True
Name: V, dtype: bool