更新时间:2023-10-27 09:09:04
第一个函数创建一个指向同一对象的指针列表(而不是字节列表),然后join
将进行一次内存分配,而COUNT
调用memcpy
.
The first function creates a list of pointers to the same object (NOT a list of bytes), then join
would do one memory allocation and COUNT
calls to memcpy
.
您可以通过删除临时列表并使用itertools.repeat
来使第一个功能的速度加快(在我的测试中是5倍):
You can make the first function times faster (5x in my test) by dropping temporary list and using itertools.repeat
:
def bytes_list_test_opt():
tStart = time.clock()
bs = b''.join(itertools.repeat(MSG, COUNT))
print('byte list opt time:', time.clock() - tStart)
,或者在这种特殊情况下,只需使用bytes
对象的*
运算符,即可完成此操作:
or, in this particular case, simply use *
operator of bytes
objects, which does exactly that:
bs = MSG*COUNT
第二个函数在MSG
上反复迭代,逐字节存储数据,并且随着字节数组的增长而不得不重复地重新分配内存.
The second function repeatedly iterates over MSG
, stores data byte-by-byte and has to repeatedly reallocate memory as the bytearray grows.
通过将调用替换为extend
:
def bytearray_test_opt():
tStart = time.clock()
ba = bytearray()
for i in range(COUNT):
ba.extend(MSG)
print('array opt time:', time.clock() - tStart)
进行此修改后,第二个功能将仅比第一个功能慢,这是因为有更多的重新分配(在我的测试中约为15%).
After this modification, the second function will be slower than the first one only because of additional reallocations (~15% in my test).
第三个函数使用bytearray
的slice分配,该分配接受可迭代的功能,并且似乎在执行相同的逐字节迭代,而没有意识到它们只能将memcpy
个字节放入该位置.这看起来像是可以修复的标准库中的缺陷.
The third function uses bytearray
's slice assignment, which accepts iterable and seems to be doing the same byte-by-byte iteration without recognizing that they could just memcpy
bytes into the place. This looks like a defect in the standard library that can be fixed.
从之前的优化中可以看到,与逐字节复制相比,分配花费的时间非常少,因此预分配在这里没有明显的影响.您可以节省一些时间来进行较少的计算,但这也无济于事:
As you see from the previous optimization, allocations take very small amount time compared to byte-by-byte copying, so preallocating has no visible impact here. You could save some time on doing fewer calculations, but it wont help much either:
def initialized_bytearray_test_opt():
tStart = time.clock()
L = len(MSG)
ba = bytearray(L*COUNT)
ofs = 0
for i in range(COUNT):
ba[ofs : ofs+L] = MSG
ofs += L
print('initialized array opt time:', time.clock() - tStart)
我的机器的最终计时:
byte list time: 0.004823000000000001
byte list opt time: 0.0008649999999999977
array time: 0.043324
array opt time: 0.005505999999999997
initialized array time: 0.05936899999999999
initialized array opt time: 0.040164000000000005
P.S.使用timeit
模块执行类似的措施,它可以提供更高的准确性.
P.S. Use timeit
module to perform measures like this, it provides much higher accuracy.