且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

Python 如何处理大于 64 位无符号整数限制的数字?

更新时间:2023-02-11 16:50:35

Integers as Digit-Arrays

Python 3 (CPython) 整数不是本地机器整数.从逻辑上讲,每个整数都由它的 符号 和一个以 1073741824(30 位)或 32768(15 位)[*] 为基数的绝对数组成——后者是一个可变大小的无符号整数数组.为了存储更大的数字,需要一个额外的数字".被添加到数组中.

Integers as Digit-Arrays

Python 3 (CPython) integers are not native machine integers. Logically, each integer consists of its sign and an absolute number in base 1073741824 (30-bit) or 32768 (15-bit) [*] - the latter being a variable-size array of unsigned integers. To store larger numbers, an additional "digit" is added to the array.

>>> sys.getsizeof(0)          # largest  0-digit number
24
>>> sys.getsizeof(1)          # smallest 1-digit number
28
>>> sys.getsizeof(2**30 - 1)  # largest  1-digit number
28
>>> sys.getsizeof(2**30)      # smallest 2-digit number
32
>>> sys.getsizeof(2**60 - 1)  # largest  2-digit number
32

粗略地说,这与写出十进制数时添加数字的机制相同——使用 1 位到 9,2 位到 99,依此类推.同样,只要计算机有内存加一个数字"就可以了.可以定义更大尺寸的 Python 整数.

Loosely speaking, this is the same mechanism as one adds digits when writing out decimal numbers – using 1-digit is enough up to 9, 2-digit up to 99, and so on. Likewise, as long as the computer has memory to "add a digit" a Python integer of larger size can be defined.

[*] 数字是30/15 位而不是 32 位/16 位,因为这更适合某些算法. 例如,long_pow() 需要一个可被 5 整除的大小.

[*] The digits are 30-bit/15-bit instead of 32-bit/16-bit because this better fits some algorithms. For example, long_pow() requires a size divisible by 5.

实际上,整数也是对象——这意味着它们保存了元数据,例如类型和引用计数——这也占用了空间.在 CPython 中,int 包含 的:

Practically, integers are also objects – meaning they hold metadata such as type and reference count - which also takes up space. In CPython, an int consists of:

  • Py_ssize_t的引用计数器
  • 指向PyTypeObject*
  • 类型的指针
  • Py_ssize_t
  • digit[]
  • 的可变数字数组
  • reference counter of Py_ssize_t
  • pointer to the type of PyTypeObject*
  • digit count of Py_ssize_t
  • variable array of digits of digit[]

其中前三个是每个的结构可变大小的对象.符号在数字计数内编码.

where the first three are the structure of every variable size object. The sign is encoded inside the digit count.

在 64 位机器上,Py_ssize_tPyTypeObject* 都是 8 字节大小——给出0 位整数"0 大小为 3*8 字节或 24 字节.

On a 64-bit machine, both Py_ssize_t and PyTypeObject* are 8 byte in size – giving the "0-digit integer" 0 a size of 3*8 bytes or 24 bytes.

>>> sys.getsizeof(0)          # largest  0-digit number
24

那么什么是sys.maxsize?

sys.maxsize的意思不是最大整数大小,而是最大容器大小:

So what is sys.maxsize?

The meaning of sys.maxsize is not the maximum integer size, but the maximum container size:

>>> len(range(sys.maxsize))    # this is fine
9223372036854775807
>>> len(range(sys.maxsize+1))  # this is one too much
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OverflowError: range() result has too many items

这是sys.maxsize 表示 Py_ssize_t 的最大值,CPython 运行时用来表示和寻址内存的类型.虽然这看起来像是一个任意限制,但它实际上远远超过计算机可以解决的问题.