且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

使用共享数组在Python中实现快速FFT的内存对齐

更新时间:2023-02-27 10:50:53

获得正确对齐的内存的最简单的标准技巧是分配多于所需的内存,如果对齐错误,则跳过前几个字节.如果我没记错的话,NumPy数组将始终是8字节对齐的,并且FFTW需要16字节的匹配才能发挥***性能.因此,您只需分配比需要更多的8个字节,并在必要时跳过前8个字节.

The simplest standard trick to get correctly aligned memory is to allocate a bit more than needed and skip the first few bytes if the alignment is wrong. If I remember correctly, NumPy arrays will always be 8-byte aligned, and FFTW requires 16-byte aligment to perform best. So you would simply allocate 8 bytes more than needed, and skip the first 8 bytes if necessary.

编辑:这很容易实现.在NumPy数组的ctypes.data属性中,数据指针可以作为整数使用.使用移位的块可以通过切片,查看为不同的数据类型并重塑来实现-所有这些都不会复制数据,而是重用相同的buf.

Edit: This is rather easy to implement. The pointer to the data is available as an integer in the ctypes.data attribute of a NumPy array. Using the shifted block can be achieved by slicing, viewing as a different data type and reshaping -- all these won't copy the data, but rather reuse the same buf.

要分配一个16字节对齐的1000x1000的64位浮点数数组,可以使用以下代码:

To allocate an 16-byte aligned 1000x1000 array of 64-bit floating point numbers, we could use this code:

m = n = 1000
dtype = numpy.dtype(numpy.float64)
nbytes = m * n * dtype.itemsize
buf = numpy.empty(nbytes + 16, dtype=numpy.uint8)
start_index = -buf.ctypes.data % 16
a = buf[start_index:start_index + nbytes].view(dtype).reshape(m, n)

现在,a是具有所需属性的数组,可以通过检查a.ctypes.data % 16确实是0来验证.

Now, a is an array with the desired properties, as can be verified by checking that a.ctypes.data % 16 is indeed 0.