且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

从CUDA内核访问全局存储器指针

更新时间:2022-05-14 23:30:14

code> cudaMalloc 和 cudaMemcpy 上的 __ device __

We don't use cudaMalloc and cudaMemcpy on __device__ variables.

阅读
$ b的变量,rel =nofollow> __ device __ $ b

Read the documentation for __device__ variables, where it states the API calls to be used:

 cudaMemcpyToSymbol();
 cudaMemcpyFromSymbo();

如果您想使用 cudaMalloc 动态分配的设备数组,但将返回的指针存储在 __ device __ 变量中,您必须这样做:

If you want to use cudaMalloc on a dynamically allocated device array, but store the returned pointer in a __device__ variable, you'll have to do something like this:

void Init()
{
    int* data = new int[SIZE];
    int* d_data;
    cudaError_t cudaStatus;
    cudaStatus = cudaMalloc(&d_data, SIZE * sizeof(int));
    for (int i = 0; i < SIZE; i++)
        data[i] = i;

    cudaStatus = cudaMemcpy(d_data, data, SIZE * sizeof(int), cudaMemcpyHostToDevice);
    cudaMemcpyToSymbol(cData, &d_data, sizeof(int *));
    delete data;
}

当我按原样编译代码时, CUDA 6 nvcc

When I compile your code as-is, I get the following compiler warning from CUDA 6 nvcc:

t411.cu(15): warning: a __device__ variable "cData" cannot be directly read in a host function

不应忽略这些警告。

如果 SIZE 在编译期是已知的,就像在你的例子中,你也可以做这:

If SIZE is known at compile-time, as it is in your example, you can also do something like this:

__device__ int cData[SIZE];

void Init()
{
    int* data = new int[SIZE];
    cudaError_t cudaStatus;
    for (int i = 0; i < SIZE; i++)
        data[i] = i;
    cudaStatus = cudaMemcpyToSymbol(cData, data, SIZE * sizeof(int));
    delete data;
}