使用cufft实现实数到复数FFT

更新时间：2023-02-27 09:24:01

（似乎这应该是一个重复的问题，但我无法找到重复的问题。）

使用就地转换时，需要对输入数据进行不同的组织（填充）。在2D情况下，这一点尤其明显，因为必须填充每一行数据。

在非就地R2C转换中，输入数据是实值且具有大小高度*宽度（例如，R = 4，C = 4的情况）：

  XXXX 
 XXXX 
 XXXX 
 XXXX

以上数据将恰好占据 16 * sizeof（cufftReal）（假设 float 输入数据，维度R = 4，C = 4），线性地，无间隙地存储在内存中。但是，当我们切换到就地转换时，输入缓冲区的大小会更改。大小的这种变化会对数据安排产生影响。具体来说，输入缓冲区的大小为 R *（C / 2 + 1）* sizeof（cufftComplex）。对于R = 4，C = 4的示例情况，即 12 * sizeof（cufftComplex）或 24 * sizeof（cufftReal），但它仍被组织为4行数据。因此，每行的长度为6（如果用 cufftReal 度量）或3（如果用 cufftComplex 度量）。将其视为 cufftReal ，然后在创建输入数据时，我们必须像这样进行组织：

  XXXXPP 
 XXXXPP 
 XXXXPP 
 XXXXPP

P 位置是填充数据，而不是您的输入数据。如果我们在内存中线性查看，则看起来像这样：

  XXXXPPXXXXPPXXXXPPXXX XPP

这是对CUFFT的期望/要求（我相信对于FFTW也是一样）。但是，由于您没有更改数据存储方式，因此提供的数据如下：

  XXXXXXXXXXXXXXXXPPPPP PPP

这两种模式的差异是造成结果输出差异的原因。有多种解决方法。我将选择使用 DudaMemc 在原位填充设备输入缓冲区，这将为我们提供所需的模式。这可能不是***/最快的方法，具体取决于您的应用程序需求。

您也没有将正确大小的结果数据从设备复制回主机。 / p>

以下是一个固定示例：

  $ cat t1589.cu 
 #include< cufft.h> 
 #include< iostream> 
 #include< cstdlib> 

 struct mat3d {
 int _width; 
 int _height; 
 cufftReal * _pData; 
}; 


 void fftCuda2d（mat3d * scene）
 {
 cufftResult resultStatus; 
 cudaError_t cuda_status; 

 cufftHandle plan_forward; 

 resultStatus = cufftPlan2d（& plan_forward，scene-> _height，scene-> _width，CUFFT_R2C）; 
 std :: cout<< 提前制定计划：<< （int）resultStatus<< std :: endl; 

 cufftComplex * d_fft，* d_scene，* h_fft; 

 size_t size_fft =（int（scene-> _width / 2）+1）* scene-> _height; 

 cudaMalloc（（void **）& d_scene，sizeof（cufftComplex）* size_fft）; 
 cudaMalloc（（void **）& d_fft，sizeof（cufftComplex）* size_fft）; 


 h_fft =（cufftComplex *）malloc（sizeof（cufftComplex）* size_fft）; 

 #ifdef USE_IP 
 cuda_status = cudaMemcpy2D（d_scene，（（（scene-> _width / 2）+1）* sizeof（cufftComplex），scene-> _pData，（scene-> ; _width）* sizeof（cufftReal），sizeof（cufftReal）* scene-> _width，scene-> _height，cudaMemcpyHostToDevice）; 
 resultStatus = cufftExecR2C（plan_forward，（cufftReal *）d_scene，d_scene）; 
 cuda_status = cudaMemcpy（h_fft，d_scene，sizeof（cufftComplex）* size_fft，cudaMemcpyDeviceToHost）; 
 #else 
 cuda_status = cudaMemcpy（d_scene，scene-> _pData，sizeof（cufftReal）* scene-> _height * scene-> _width，cudaMemcpyHostToDevice）; 
 resultStatus = cufftExecR2C（plan_forward，（cufftReal *）d_scene，d_fft）; 
 cuda_status = cudaMemcpy（h_fft，d_fft，sizeof（cufftComplex）* size_fft，cudaMemcpyDeviceToHost）; 
 #endif 
 std :: cout<<  exec：<< （int）resultStatus<< std :: endl; 

 for（int i = 0; i< size_fft; i ++）
 std :: cout< h_fft [i] .x<< << h_fft [i] .y<< ，; 
 std :: cout<< std :: endl; 
} 
 const int dim = 4; 
 int main（）{

 mat3d myScene; 
 myScene._pData = new cufftReal [dim * dim]; 
 myScene._width =昏暗; 
 myScene._height =昏暗; 
 for（int i = 0; i  fftCuda2d（& myScene）; 
 std :: cout<< cudaGetErrorString（cudaGetLastError（））<< std :: endl; 
} 
 $ nvcc -lineinfo -o t1589 t1589.cu -lcufft 
 t1589.cu（15）：警告：设置了变量 cuda_status，但从未使用过

 $ ./t1589 
向前创建计划：0 
 exec：0 
 9.71338 0，-0.153554 1.45243,0.171302 0,0.878097 0.533959,0.424595 -0.834714,0.858133 -0.393671，-0.205139 0， -0.131513 -0.494514，-0.165712 0,0.878097 -0.533959,0.0888268 1.49303,0.858133 0.393671，
没有错误
 $ nvcc -lineinfo -o t1589 t1589.cu -lcufft -DUSE_IP 
 t1589.cu （15）：警告：已设置变量 cuda_status，但从未使用过

 $ ./t1589 
向前创建计划：0 
 exec：0 
 9.71338 0， -0.153554 1.45243,0.171302 0,0.878097 0.533959,0.424595 -0.834714,0.858133 -0.393671，-0.205139 0，-0.131513 -0.494514，-0.165712 0,0.878097 -0.533959,0.0888268 1.49303,0.858133 0.393671，
没有错误
 $

I am trying to perform an inplace real to complex FFT with cufft. I am aware of the similar question How to perform a Real to Complex Transformation with cuFFT. However I have issues trying to reproduce the same method.

If I do an out of place transformation, there is no problem, but as soon as I do it in place, I do not have the correct values in the FFT (Checked with python, using binary files in between). I do not have errors, but just non correct values.

Here is my code:

void fftCuda2d(mat3d* scene)
{
    cufftResult resultStatus;
    cudaError_t cuda_status;

    cufftHandle plan_forward;

    resultStatus = cufftPlan2d(&plan_forward, scene->_height, scene->_width, CUFFT_R2C);
    cout << "Creating plan forward: " << _cudaGetErrorEnum(resultStatus) << endl;

    cufftComplex *d_fft, *d_scene, *h_fft;

    size_t size_fft = (int(scene->_width/2)+1)*scene->_height;

    cudaMalloc((void**)&d_scene, sizeof(cufftComplex)*size_fft);
    cudaMalloc((void**)&d_fft, sizeof(cufftComplex)*size_fft);


    h_fft = (cufftComplex*) malloc(sizeof(cufftComplex)*size_fft);

    cuda_status = cudaMemcpy(d_scene, scene->_pData, sizeof(cufftReal) * scene->_height * scene->_width, cudaMemcpyHostToDevice);

    resultStatus = cufftExecR2C(plan_forward, (cufftReal*) d_scene, d_scene);

    cuda_status = cudaMemcpy(h_fft, d_scene, sizeof(cufftReal)*scene->_height*scene->_width, cudaMemcpyDeviceToHost);

    FILE* *pFileTemp;

    pFileTemp = fopen("temp.bin", "wb");

    check = fwrite(h_fft, sizeof(cufftComplex), sizeFft, pFileTemp);

}

If I use resultStatus = cufftExecR2C(plan_forward, (cufftReal*) d_scene, d_fft); and save the output of d_fft I have the correct result. So you see any mistake of mine here?

P.S Mat3d is a struct where _width and _height contain the size of the matrix and pData is the pointer to the data but there is no issue with that.

(It seems like this should be a duplicate question but I was not able to locate the duplicate.)

Your input data needs to be organized differently (padded) when using an in-place transform. This is particularly noticeable in the 2D case, because each row of data must be padded.

In the non-inplace R2C transform, the input data is real-valued and of size height*width (for an example R=4, C=4 case):

X X X X
X X X X
X X X X
X X X X

The above data would occupy exactly 16*sizeof(cufftReal) (assuming float input data, dimension R = 4, C = 4), and it would be organized that way in memory, linearly, with no gaps. However, when we switch to an in-place transform, the size of the input buffer changes. And this change in size has ramifications for data arrangement. Specifically, the sizeof the input buffer is R*(C/2 + 1)*sizeof(cufftComplex). For the R=4, C=4 example case, that is 12*sizeof(cufftComplex) or 24*sizeof(cufftReal), but it is still organized as 4 rows of data. Each row, therefore, is of length 6 (if measured in cufftReal) or 3 (if measured in cufftComplex). Considering it as cufftReal, then when we create our input data, we must organize it like this:

X X X X P P
X X X X P P
X X X X P P
X X X X P P

where the P locations are "padding" data, not your input data. If we view this linearly in memory, it looks like:

X X X X P P X X X X P P X X X X P P X X X X P P

That is the expectation/requirement of CUFFT (and I believe it is the same for FFTW). However since you made no changes to the way you deposited your data, you provided data that looks like this:

X X X X X X X X X X X X X X X X P P P P P P P P

and the difference in those 2 patterns is what accounts for the difference in the result output. There are a variety of ways to fix this. I'll choose to demonstrate using cudaMemcpy2D to populate the device input buffer in the in-place case, which will give us the desired pattern. This may not be the best/fastest way, depending on your application needs.

You were also not copying the correct size of the result data from device back to host.

Here is a fixed example:

$ cat t1589.cu
#include <cufft.h>
#include <iostream>
#include <cstdlib>

struct mat3d{
  int _width;
  int _height;
  cufftReal *_pData;
};


void fftCuda2d(mat3d* scene)
{
    cufftResult resultStatus;
    cudaError_t cuda_status;

    cufftHandle plan_forward;

    resultStatus = cufftPlan2d(&plan_forward, scene->_height, scene->_width, CUFFT_R2C);
    std::cout << "Creating plan forward: " <<  (int)resultStatus << std::endl;

    cufftComplex *d_fft, *d_scene, *h_fft;

    size_t size_fft = (int(scene->_width/2)+1)*scene->_height;

    cudaMalloc((void**)&d_scene, sizeof(cufftComplex)*size_fft);
    cudaMalloc((void**)&d_fft, sizeof(cufftComplex)*size_fft);


    h_fft = (cufftComplex*) malloc(sizeof(cufftComplex)*size_fft);

#ifdef USE_IP
    cuda_status = cudaMemcpy2D(d_scene, ((scene->_width/2)+1)*sizeof(cufftComplex), scene->_pData, (scene->_width)*sizeof(cufftReal), sizeof(cufftReal) * scene->_width, scene->_height, cudaMemcpyHostToDevice);
    resultStatus = cufftExecR2C(plan_forward, (cufftReal*) d_scene, d_scene);
    cuda_status = cudaMemcpy(h_fft, d_scene, sizeof(cufftComplex)*size_fft, cudaMemcpyDeviceToHost);
#else
    cuda_status = cudaMemcpy(d_scene, scene->_pData, sizeof(cufftReal) * scene->_height * scene->_width, cudaMemcpyHostToDevice);
    resultStatus = cufftExecR2C(plan_forward, (cufftReal*) d_scene, d_fft);
    cuda_status = cudaMemcpy(h_fft, d_fft, sizeof(cufftComplex)*size_fft, cudaMemcpyDeviceToHost);
#endif
    std::cout << "exec: " <<  (int)resultStatus << std::endl;

    for (int i = 0; i < size_fft; i++)
      std::cout << h_fft[i].x << " " << h_fft[i].y << ",";
    std::cout << std::endl;
}
const int dim = 4;
int main(){

  mat3d myScene;
  myScene._pData  = new cufftReal[dim*dim];
  myScene._width  = dim;
  myScene._height = dim;
  for (int i = 0; i < dim*dim; i++) myScene._pData[i] = rand()/(float)RAND_MAX;
  fftCuda2d(&myScene);
  std::cout << cudaGetErrorString(cudaGetLastError()) << std::endl;
}
$ nvcc -lineinfo -o t1589 t1589.cu -lcufft
t1589.cu(15): warning: variable "cuda_status" was set but never used

$ ./t1589
Creating plan forward: 0
exec: 0
9.71338 0,-0.153554 1.45243,0.171302 0,0.878097 0.533959,0.424595 -0.834714,0.858133 -0.393671,-0.205139 0,-0.131513 -0.494514,-0.165712 0,0.878097 -0.533959,0.0888268 1.49303,0.858133 0.393671,
no error
$ nvcc -lineinfo -o t1589 t1589.cu -lcufft -DUSE_IP
t1589.cu(15): warning: variable "cuda_status" was set but never used

$ ./t1589
Creating plan forward: 0
exec: 0
9.71338 0,-0.153554 1.45243,0.171302 0,0.878097 0.533959,0.424595 -0.834714,0.858133 -0.393671,-0.205139 0,-0.131513 -0.494514,-0.165712 0,0.878097 -0.533959,0.0888268 1.49303,0.858133 0.393671,
no error
$

上一篇 : ：非"NaN"指数 pandas 的价值观下一篇 : Amazon Pinpoint和Ionic - 当应用程序处于后台时，推送通知不起作用

使用cufft实现实数到复数FFT

相关阅读

技术问答最新文章