更新时间:2023-02-27 10:50:53
获得正确对齐的内存的最简单的标准技巧是分配多于所需的内存,如果对齐错误,则跳过前几个字节.如果我没记错的话,NumPy数组将始终是8字节对齐的,并且FFTW需要16字节的匹配才能发挥***性能.因此,您只需分配比需要更多的8个字节,并在必要时跳过前8个字节.
The simplest standard trick to get correctly aligned memory is to allocate a bit more than needed and skip the first few bytes if the alignment is wrong. If I remember correctly, NumPy arrays will always be 8-byte aligned, and FFTW requires 16-byte aligment to perform best. So you would simply allocate 8 bytes more than needed, and skip the first 8 bytes if necessary.
编辑:这很容易实现.在NumPy数组的ctypes.data
属性中,数据指针可以作为整数使用.使用移位的块可以通过切片,查看为不同的数据类型并重塑来实现-所有这些都不会复制数据,而是重用相同的buf.
Edit: This is rather easy to implement. The pointer to the data is available as an integer in the ctypes.data
attribute of a NumPy array. Using the shifted block can be achieved by slicing, viewing as a different data type and reshaping -- all these won't copy the data, but rather reuse the same buf.
要分配一个16字节对齐的1000x1000的64位浮点数数组,可以使用以下代码:
To allocate an 16-byte aligned 1000x1000 array of 64-bit floating point numbers, we could use this code:
m = n = 1000
dtype = numpy.dtype(numpy.float64)
nbytes = m * n * dtype.itemsize
buf = numpy.empty(nbytes + 16, dtype=numpy.uint8)
start_index = -buf.ctypes.data % 16
a = buf[start_index:start_index + nbytes].view(dtype).reshape(m, n)
现在,a
是具有所需属性的数组,可以通过检查a.ctypes.data % 16
确实是0
来验证.
Now, a
is an array with the desired properties, as can be verified by checking that a.ctypes.data % 16
is indeed 0
.