且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

保持NaN与 pandas 数据框不等式

更新时间:2022-01-02 23:25:22

您可以这样做:

new_df = df >= threshold
new_df[df.isnull()] = np.NaN

但是,这与使用apply方法会获得的结果不同.在这里,您的蒙版具有包含NaN,0.0和1.0的float dtype.在Apply解决方案中,您将获得object dtype,其中包含NaN,False和True.

But that is different from what you will get using the apply method. Here your mask has float dtype containing NaN, 0.0 and 1.0. In the apply solution you get object dtype with NaN, False, and True.

两个都不能用作遮罩,因为您可能无法获得想要的东西. IEEE表示,任何NaN比较都必须产生False,并且apply方法通过返回NaN隐式违反了该方法!

Neither are OK to be used as a mask because you might not get what you want. IEEE says that any NaN comparison must yield False and the apply method is implicitly violates that by returning NaN!

***的选择是分别跟踪NaN,并且在安装瓶颈时df.isnull()非常快.

The best option is to keep track of the NaNs separately and df.isnull() is quite fast when bottleneck is installed.