更新时间:2022-11-27 19:20:31
在熊猫中,inplace = True 是否有害?
是的,是的.不仅有害.相当有害.此 GitHub 问题 提议弃用 inplace
参数api-wide 在不久的将来某个时候.简而言之,这里是 inplace
参数的所有错误:
Yes, it is. Not just harmful. Quite harmful. This GitHub issue is proposing the inplace
argument be deprecated api-wide sometime in the near future. In a nutshell, here's everything wrong with the inplace
argument:
inplace
,顾名思义,通常不会阻止创建副本,并且(几乎)从不提供任何性能优势inplace
不适用于方法链inplace
在 DataFrame 列上调用时会导致可怕的 SettingWithCopyWarning
,并且有时可能无法就地更新列inplace
, contrary to what the name implies, often does not prevent copies from being created, and (almost) never offers any performance benefitsinplace
does not work with method chaininginplace
can lead to the dreaded SettingWithCopyWarning
when called on a DataFrame column, and may sometimes fail to update the column in-place以上痛点都是初学者常见的陷阱,去掉这个选项会大大简化API.
The pain points above are all common pitfall for beginners, so removing this option will simplify the API greatly.
我们更深入地了解以上几点.
We take a look at the points above in more depth.
性能
一个常见的误解是使用 inplace=True
将导致更高效或优化的代码.一般来说,使用inplace=True
没有性能优势.方法的大多数就地和非就地版本无论如何都会创建数据的副本,就地版本会自动将副本分配回来.副本无法避免.
Performance
It is a common misconception that using inplace=True
will lead to more efficient or optimized code. In general, there no performance benefits to using inplace=True
. Most in-place and out-of-place versions of a method create a copy of the data anyway, with the in-place version automatically assigning the copy back. The copy cannot be avoided.
方法链inplace=True
也阻碍方法链.对比
result = df.some_function1().reset_index().some_function2()
相对于
temp = df.some_function1()
temp.reset_index(inplace=True)
result = temp.some_function2()
意外陷阱
要记住的最后一个警告是调用 inplace=True
可以触发 SettingWithCopyWarning
:
df = pd.DataFrame({'a': [3, 2, 1], 'b': ['x', 'y', 'z']})
df2 = df[df['a'] > 1]
df2['b'].replace({'x': 'abc'}, inplace=True)
# SettingWithCopyWarning:
# A value is trying to be set on a copy of a slice from a DataFrame
这可能会导致意外行为.
Which can cause unexpected behavior.