且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

括号表示法和点表示法之间的速度差异,用于访问 pandas 中的列

更新时间:2022-12-28 17:39:07

df['CID']委托给

df['CID'] delegates to NDFrame.__getitem__ and it is more obvious you are performing an indexing operation.

另一方面,df.CID代表 NDFrame.__getattr__ ,它必须做一些额外的繁重的工作,主要是确定"CID"是您要使用属性访问权限来调用的属性,函数还是列(为方便起见,但不建议将其用于生产代码).

On the other hand, df.CID delegates to NDFrame.__getattr__, which has to do some additional heavy lifting, mainly to determine whether 'CID' is an attribute, a function, or a column you're calling using the attribute access (a convenience, but not recommended for production code).

现在,为什么不建议这样做?考虑,

Now, why is it not recommended? Consider,

df = pd.DataFrame({'A': [1, 2, 3]})
df.A

0    1
1    2
2    3
Name: A, dtype: int64

将"A"列称为"df.A"没有问题,因为它与熊猫中的任何属性或函数命名都没有冲突.但是,请考虑 pop 功能(仅作为示例).

There are no issues referring to column "A" as df.A, because it does not conflict with any attribute or function namings in pandas. However, consider the pop function (just as an example).

df.pop
# <bound method NDFrame.pop of ...>

df.popdf的绑定方法.现在,出于各种原因,我想创建一个名为"pop"的列.

df.pop is a bound method of df. Now, I'd like to create a column called "pop" for various reasons.

df['pop'] = [4, 5, 6]
df
   A  pop
0  1    4
1  2    5
2  3    6

很好,但是

df.pop
# <bound method NDFrame.pop of ...>

我无法使用属性符号来访问此列.但是...

I cannot use the attribute notation to access this column. However...

df['pop']

0    4
1    5
2    6
Name: pop, dtype: int64

括号符号仍然有效.这就是为什么这样更好.

Bracket notation still works. That's why this is better.