比较两个数据帧并获得差异

更新时间：2023-01-30 10:10:39

这种方法， df1！= df2 仅适用于具有相同行和列的数据帧。事实上，所有的数据框轴都与 _indexed_same 方法进行比较，如果发现差异，即使是列/索引顺序，也会引发异常。

如果我让你对，你不想找到变化，但对称的差异。为此，一种方法可能是连接数据框架：

 >>> df = pd.concat（[df1，df2]）
>>> df = df.reset_index（drop = True）

group by

 >>> df_gpby = df.groupby（list（df.columns））

获取唯一记录的索引

 >>> idx = [x [0] for x in df_gpby.groups.values（）if len（x）== 1]

过滤器

 >>> df.reindex（idx）
日期水果数字颜色
 9 2013-11-25橙色8.6橙色
 8 2013-11-25苹果22.1红色

I have two dataframes. Examples:

df1:
Date       Fruit  Num  Color 
2013-11-24 Banana 22.1 Yellow
2013-11-24 Orange  8.6 Orange
2013-11-24 Apple   7.6 Green
2013-11-24 Celery 10.2 Green

df2:
Date       Fruit  Num  Color 
2013-11-24 Banana 22.1 Yellow
2013-11-24 Orange  8.6 Orange
2013-11-24 Apple   7.6 Green
2013-11-24 Celery 10.2 Green
2013-11-25 Apple  22.1 Red
2013-11-25 Orange  8.6 Orange

Each dataframe has the Date as an index. Both dataframes have the same structure.

What i want to do, is compare these two dataframes and find which rows are in df2 that aren't in df1. I want to compare the date (index) and the first column (Banana, APple, etc) to see if they exist in df2 vs df1.

I have tried the following:

For the first approach I get this error: "Exception: Can only compare identically-labeled DataFrame objects". I have tried removing the Date as index but get the same error.

On the third approach, I get the assert to return False but cannot figure out how to actually see the different rows.

Any pointers would be welcome

This approach, df1 != df2, works only for dataframes with identical rows and columns. In fact, all dataframes axes are compared with _indexed_same method, and exception is raised if differences found, even in columns/indices order.

If I got you right, you want not to find changes, but symmetric difference. For that, one approach might be concatenate dataframes:

>>> df = pd.concat([df1, df2])
>>> df = df.reset_index(drop=True)

group by

>>> df_gpby = df.groupby(list(df.columns))

get index of unique records

>>> idx = [x[0] for x in df_gpby.groups.values() if len(x) == 1]

filter

>>> df.reindex(idx)
         Date   Fruit   Num   Color
9  2013-11-25  Orange   8.6  Orange
8  2013-11-25   Apple  22.1     Red

上一篇 : ：Windows窗体应用程序不运行其他计算机下一篇 : Datejs - 12:00 pm的问题

比较两个数据帧并获得差异

相关阅读

技术问答最新文章