更新时间:2021-11-10 23:31:41
该解决方案已在另一个答案中给出: https://***.com/a/19208070/678093
The solution has already been given in another answer: https://***.com/a/19208070/678093
对于您的示例,这意味着:
For your example, this means:
将输入分配为cufftComplex:
Allocate input as cufftComplex:
cufftComplex *deviceInputData;
gpuErrchk(cudaMalloc((void**)&deviceInputData, DATASIZE * sizeof(cufftComplex)));
cudaMemcpy(deviceInputData, hostInputData, DATASIZE * sizeof(cufftReal), cudaMemcpyHostToDevice);
就地转换:
cufftStatus = cufftExecR2C(handle, (cufftReal *)deviceInputData, deviceInputData);
gpuErrchk(cudaMemcpy(hostOutputData, deviceInputData, (DATASIZE / 2 + 1) * sizeof(cufftComplex), cudaMemcpyDeviceToHost));
btw:MATLAB还包含fft()的GPU加速版本,也许这对您也可能有用:
btw: MATLAB also contains a GPU accelerated version of fft(), maybe this could be useful for you as well: http://de.mathworks.com/help/distcomp/run-built-in-functions-on-a-gpu.html#btjw5gk