且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

Pandas和groupby计算两个不同列中的匹配数

更新时间:2022-02-27 22:38:29

使用 isin 用于比较列,并按汇总汇总为sum的列进行分组,最后强制转换为int

Use isin for compare columns and groupby by columns with aggregate sum, last cast to int and reset_index for columns from MultiIndex:

a = (df['material1'].isin(df['material2']))
df = a.groupby([df['claim'], df['event']]).sum().astype(int).reset_index(name='matches')

分配给新列的解决方案:

Solution with assign to new column:

df['matches'] = df['material1'].isin(df['material2']).astype(int)
df = df.groupby(['claim', 'event'])['matches'].sum().reset_index()

@Wen的解决方案,谢谢:

Solutions by @Wen, thank you:

df['matches'] = df['material1'].isin(df['material2']).astype(int)
df = df.groupby(['claim', 'event'], as_index=False)['matches'].sum()

我认为在较大的DataFrame s中应该会更慢:

I think it should be slowier in larger DataFrames:

df = (df.groupby(['claim', 'event'])
                  .apply(lambda x : x['material1'].isin(x['material2']).astype(int).sum())
                  .reset_index(name='matches'))


print (df)
  claim event  matches
0     A     X        3
1     A     Y        1
2     B     Z        0