且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

如何删除 pandas 数据框中具有重复列值的行?

更新时间:2023-08-28 16:10:58

drop_duplicatessubset一起使用,并在列列表中检查重复项,并在keep='first'上保留重复项.

Using drop_duplicates with subset with list of columns to check for duplicates on and keep='first' to keep first of duplicates.

如果dataframe是:

df = pd.DataFrame({'Column1': ["'cat'", "'toy'", "'cat'"],
                   'Column2': ["'bat'", "'flower'", "'bat'"],
                   'Column3': ["'xyz'", "'abc'", "'lmn'"]})
print(df)

结果:

  Column1   Column2 Column3
0   'cat'     'bat'   'xyz'
1   'toy'  'flower'   'abc'
2   'cat'     'bat'   'lmn'

然后:

result_df = df.drop_duplicates(subset=['Column1', 'Column2'], keep='first')
print(result_df)

结果:

  Column1   Column2 Column3
0   'cat'     'bat'   'xyz'
1   'toy'  'flower'   'abc'