更新时间:2023-01-22 07:42:40
First I think working with list
s in pandas is not good idea.
如果将列表转换为带有元组的帮助器列,则解决方案有效-然后 drop_duplicates
:
Solution working if convert lists to helper column with tuples - then sort_values
with drop_duplicates
:
df['new'] = df.pair.apply(tuple)
df = df.sort_values('score', ascending=False).drop_duplicates('new')
print (df)
pair score new
0 [A, A] 1.0000 (A, A)
1 [A, F] 0.9990 (A, F)
5 [A, H] 0.9990 (A, H)
2 [A, G] 0.9985 (A, G)
或添加2个新列:
df[['a', 'b']] = pd.DataFrame(df.pair.values.tolist())
df = df.sort_values('score', ascending=False).drop_duplicates(['a', 'b'])
print (df)
pair score a b
0 [A, A] 1.0000 A A
1 [A, F] 0.9990 A F
5 [A, H] 0.9990 A H
2 [A, G] 0.9985 A G