且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

list()比列表理解使用更多的内存

更新时间:2023-01-15 20:34:33

我认为您正在看到过度分配模式,这是

I think you're seeing over-allocation patterns this is a sample from the source:

/* This over-allocates proportional to the list size, making room
 * for additional growth.  The over-allocation is mild, but is
 * enough to give linear-time amortized behavior over a long
 * sequence of appends() in the presence of a poorly-performing
 * system realloc().
 * The growth pattern is:  0, 4, 8, 16, 25, 35, 46, 58, 72, 88, ...
 */

new_allocated = (newsize >> 3) + (newsize < 9 ? 3 : 6);


打印长度为0-88的列表理解的大小,您可以看到模式匹配项:


Printing the sizes of list comprehensions of lengths 0-88 you can see the pattern matches:

# create comprehensions for sizes 0-88
comprehensions = [sys.getsizeof([1 for _ in range(l)]) for l in range(90)]

# only take those that resulted in growth compared to previous length
steps = zip(comprehensions, comprehensions[1:])
growths = [x for x in list(enumerate(steps)) if x[1][0] != x[1][1]]

# print the results:
for growth in growths:
    print(growth)

结果(格式为(list length, (old total size, new total size))):

(0, (64, 96)) 
(4, (96, 128))
(8, (128, 192))
(16, (192, 264))
(25, (264, 344))
(35, (344, 432))
(46, (432, 528))
(58, (528, 640))
(72, (640, 768))
(88, (768, 912))


出于性能原因而进行了超额分配,从而允许列表增长而不会每次增长都分配更多的内存(更好的摊销效果).


The over-allocation is done for performance reasons allowing lists to grow without allocating more memory with every growth (better amortized performance).

使用列表理解的可能原因之一是列表理解不能确定性地计算生成列表的大小,而list()可以.这意味着在使用过度分配填充列表的过程中,理解力将不断增长,直到最终填充列表.

A probable reason for the difference with using list comprehension, is that list comprehension can not deterministically calculate the size of the generated list, but list() can. This means comprehensions will continuously grow the list as it fills it using over-allocation until finally filling it.

一旦完成,很可能不会增加未分配的未分配节点的过度分配缓冲区(实际上,在大多数情况下,这样做不会破坏过度分配的目的).

It is possible that is will not grow the over-allocation buffer with unused allocated nodes once its done (in fact, in most cases it wont, that would defeat the over-allocation purpose).

list()可以添加一些缓冲区,无论列表大小如何,因为它事先知道最终的列表大小.

list(), however, can add some buffer no matter the list size since it knows the final list size in advance.

另一个从源头获得的支持证据是,我们看到了LIST_APPEND 的> list comprehensions表示list.resize的用法,这反过来表示在不知道要填充多少预分配缓冲区的情况下使用了预分配缓冲区.这与您看到的行为一致.

Another backing evidence, also from the source, is that we see list comprehensions invoking LIST_APPEND, which indicates usage of list.resize, which in turn indicates consuming the pre-allocation buffer without knowing how much of it will be filled. This is consistent with the behavior you're seeing.

最后,list()将根据列表大小预分配更多节点

To conclude, list() will pre-allocate more nodes as a function of the list size

>>> sys.getsizeof(list([1,2,3]))
60
>>> sys.getsizeof(list([1,2,3,4]))
64

列表推导不知道列表的大小,因此随着列表的增长,它会使用追加操作,从而耗尽了预分配缓冲区:

List comprehension does not know the list size so it uses append operations as it grows, depleting the pre-allocation buffer:

# one item before filling pre-allocation buffer completely
>>> sys.getsizeof([i for i in [1,2,3]]) 
52
# fills pre-allocation buffer completely
# note that size did not change, we still have buffered unused nodes
>>> sys.getsizeof([i for i in [1,2,3,4]]) 
52
# grows pre-allocation buffer
>>> sys.getsizeof([i for i in [1,2,3,4,5]])
68