且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

全局名称空间与本地名称空间的性能差异

更新时间:2023-11-19 23:00:16

差异确实确实很大取决于做什么"的实际操作,而主要取决于多少访问定义/使用的名称的时间.假定代码相似,这两种情况之间就存在根本的区别:

The difference does indeed greatly depend on what "do stuff" actually does and mainly on how many times it accesses names that are defined/used. Granted that the code is similar, there is a fundamental difference between these two cases:

  • 在函数中,用于加载/存储名称的字节码由 STORE_FAST .
  • 在***范围(即模块)中,使用 LOAD_NAME /
  • In functions, the byte code for loading/storing names is done with LOAD_FAST/STORE_FAST.
  • In the top level scope (i.e module), the same commands are performed with LOAD_NAME/STORE_NAME which are more sluggish.

在以下情况下可以查看,我将使用for循环来确保对定义的变量进行多次查找.

This can be viewed in the following cases, I'll be using a for loop to make sure that lookups for variables defined is performed multiple times.

功能和LOAD_FAST/STORE_FAST:

Function and LOAD_FAST/STORE_FAST:

我们定义了一个简单的函数,可以执行一些非常愚蠢的事情:

We define a simple function that does some really silly things:

def main():
    b = 20
    for i in range(1000000): z = 10 * b 
    return z

dis.dis 生成的输出> :

dis.dis(main)
# [/snipped output/]

             18 GET_ITER
        >>   19 FOR_ITER                16 (to 38)
             22 STORE_FAST               1 (i)
             25 LOAD_CONST               3 (10)
             28 LOAD_FAST                0 (b)
             31 BINARY_MULTIPLY
             32 STORE_FAST               2 (z)
             35 JUMP_ABSOLUTE           19
        >>   38 POP_BLOCK

# [/snipped output/]

这里要注意的是在偏移2832处的LOAD_FAST/STORE_FAST命令,这些命令用于访问BINARY_MULTIPLY操作中使用的b名称并存储z名称,分别.就像他们的字节码名称所暗示的那样,它们是LOAD_*/STORE_*系列的快速版本.

The thing to note here is the LOAD_FAST/STORE_FAST commands at the offsets 28 and 32, these are used to access the b name used in the BINARY_MULTIPLY operation and store the z name, respectively. As their byte code name implies, they are the fast version of the LOAD_*/STORE_* family.

模块和LOAD_NAME/STORE_NAME:

Modules and LOAD_NAME/STORE_NAME:

现在,让我们看一下前一个功能的模块版本的dis输出:

Now, let's look at the output of dis for our module version of the previous function:

# compile the module
m = compile(open('main.py', 'r').read(), "main", "exec")

dis.dis(m)
# [/snipped output/]

             18 GET_ITER
        >>   19 FOR_ITER                16 (to 38)
             22 STORE_NAME               2 (i)
             25 LOAD_NAME                3 (z)
             28 LOAD_NAME                0 (b)
             31 BINARY_MULTIPLY
             32 STORE_NAME               3 (z)
             35 JUMP_ABSOLUTE           19
        >>   38 POP_BLOCK

# [/snipped output/]

在这里,我们多次调用了LOAD_NAME/STORE_NAME,如前所述,是要执行的更慢的命令.

Over here we have multiple calls to LOAD_NAME/STORE_NAME, which, as mentioned previously, are more sluggish commands to execute.

在这种情况下,执行时间会有明显的差异,主要是因为Python必须多次评估LOAD_NAME/STORE_NAMELOAD_FAST/STORE_FAST(由于我添加了for循环),因此,每次执行每个字节代码的代码都会累积.

In this case, there is going to be a clear difference in execution time, mainly because Python must evaluate LOAD_NAME/STORE_NAME and LOAD_FAST/STORE_FAST multiple times (due to the for loop I added) and, as a result, the overhead introduced each time the code for each byte code is executed will accumulate.

将执行作为模块"计时:

Timing the execution 'as a module':

start_time = time.time()
b = 20 
for i in range(1000000): z = 10 *b
print(z)
print("Time: ", time.time() - start_time)
200
Time:  0.15162253379821777

将执行定为函数:

start_time = time.time()
print(main())
print("Time: ", time.time() - start_time)
200
Time:  0.08665871620178223 

如果time循环使用较小的range(例如for i in range(1000)),则会注意到'模块'版本更快.发生这种情况是因为需要调用函数main()引入的开销大于*_FAST*_NAME差异引入的开销.因此,这很大程度上取决于完成的工作量.

If you time loops in a smaller range (for example for i in range(1000)) you'll notice that the 'module' version is faster. This happens because the overhead introduced by needing to call function main() is larger than that introduced by *_FAST vs *_NAME differences. So it's largely relative to the amount of work that is done.

因此,真正的罪魁祸首以及出现这种区别的原因是使用了for循环. 通常,您有0理由在脚本的顶层放置这样的密集循环.将其移动到函数中并避免使用全局变量,这样可以提高效率.

So, the real culprit here, and the reason why this difference is evident, is the for loop used. You generally have 0 reason to ever put an intensive loop like that one at the top level of your script. Move it in a function and avoid using global variables, it is designed to be more efficient.

您可以查看为每个字节代码执行的代码.我将在此处链接 3.5 版本的Python的源,即使我我很确定 2.7 差别不大.字节码评估是在 Python/ceval.c 中完成的,特别是在函数 PyEval_EvalFrameEx :

You can take a look at the code executed for each of the byte code. I'll link the source for the 3.5 version of Python here even though I'm pretty sure 2.7 doesn't differ much. Bytecode evaluation is done in Python/ceval.c specifically in function PyEval_EvalFrameEx:

您将看到,*_FAST字节码只是使用

As you'll see, the *_FAST bytecodes simply get the value stored/loaded using a fastlocals local symbol table contained inside frame objects.