更新时间:2022-02-27 22:38:29
使用 isin
用于比较列,并按汇总汇总为sum
的列进行分组,最后强制转换为int
和
Use isin
for compare columns and groupby by columns with aggregate sum
, last cast to int
and reset_index
for columns from MultiIndex
:
a = (df['material1'].isin(df['material2']))
df = a.groupby([df['claim'], df['event']]).sum().astype(int).reset_index(name='matches')
分配给新列的解决方案:
Solution with assign to new column:
df['matches'] = df['material1'].isin(df['material2']).astype(int)
df = df.groupby(['claim', 'event'])['matches'].sum().reset_index()
@Wen的解决方案,谢谢:
Solutions by @Wen, thank you:
df['matches'] = df['material1'].isin(df['material2']).astype(int)
df = df.groupby(['claim', 'event'], as_index=False)['matches'].sum()
我认为在较大的DataFrame
s中应该会更慢:
I think it should be slowier in larger DataFrame
s:
df = (df.groupby(['claim', 'event'])
.apply(lambda x : x['material1'].isin(x['material2']).astype(int).sum())
.reset_index(name='matches'))
print (df)
claim event matches
0 A X 3
1 A Y 1
2 B Z 0