且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

在 pandas 中, inplace = True 是否被认为有害?

更新时间:2022-11-27 19:20:31

在熊猫中,inplace = True 是否有害?

是的,是的.不仅有害.相当有害.此 GitHub 问题 提议弃用 inplace 参数api-wide 在不久的将来某个时候.简而言之,这里是 inplace 参数的所有错误:

Yes, it is. Not just harmful. Quite harmful. This GitHub issue is proposing the inplace argument be deprecated api-wide sometime in the near future. In a nutshell, here's everything wrong with the inplace argument:

  • inplace,顾名思义,通常不会阻止创建副本,并且(几乎)从不提供任何性能优势
  • inplace 不适用于方法链
  • inplace 在 DataFrame 列上调用时会导致可怕的 SettingWithCopyWarning,并且有时可能无法就地更新列
  • inplace, contrary to what the name implies, often does not prevent copies from being created, and (almost) never offers any performance benefits
  • inplace does not work with method chaining
  • inplace can lead to the dreaded SettingWithCopyWarning when called on a DataFrame column, and may sometimes fail to update the column in-place

以上痛点都是初学者常见的陷阱,去掉这个选项会大大简化API.

The pain points above are all common pitfall for beginners, so removing this option will simplify the API greatly.

我们更深入地了解以上几点.

We take a look at the points above in more depth.

性能
一个常见的误解是使用 inplace=True 将导致更高效或优化的代码.一般来说,使用inplace=True没有性能优势.方法的大多数就地和非就地版本无论如何都会创建数据的副本,就地版本会自动将副本分配回来.副本无法避免.

Performance
It is a common misconception that using inplace=True will lead to more efficient or optimized code. In general, there no performance benefits to using inplace=True. Most in-place and out-of-place versions of a method create a copy of the data anyway, with the in-place version automatically assigning the copy back. The copy cannot be avoided.

方法链
inplace=True阻碍方法链.对比

result = df.some_function1().reset_index().some_function2()

相对于

temp = df.some_function1()
temp.reset_index(inplace=True)
result = temp.some_function2()

意外陷阱
要记住的最后一个警告是调用 inplace=True 可以触发 SettingWithCopyWarning:

df = pd.DataFrame({'a': [3, 2, 1], 'b': ['x', 'y', 'z']})

df2 = df[df['a'] > 1]
df2['b'].replace({'x': 'abc'}, inplace=True)
# SettingWithCopyWarning: 
# A value is trying to be set on a copy of a slice from a DataFrame

这可能会导致意外行为.

Which can cause unexpected behavior.