且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

CUDA 常量内存值不正确

更新时间:2023-11-08 10:26:16

这一行:

__constant__ int numElements;

具有编译单元范围.这意味着如果你将它编译到一个模块中,也编译到另一个模块中,这两个模块将在 __constant__ 内存中具有不同的 numElements 实例化.

has compilation unit scope. That means if you compile it into one module, and also into another module, the two modules will have different instantiations of numElements in __constant__ memory.

解决方法是使用单独编译和链接,将两个模块设备链接在一起,此时设备链接器将在两个模块之间解析符号.

The solution is to use separate compilation and linking, to device-link the two modules together, at which point the symbol will be resolved between the two modules by the device linker.

nvcc -arch=sm_20 -rdc=true -o test common.cu test.cu

示例:

$ cat common.cuh
#ifndef COMMON_CU
extern __constant__ int numElements;
#endif
__global__
void kernelFunction();
$ cat common.cu
#define COMMON_CU
#include "common.cuh"
#include <stdio.h>

__constant__ int numElements;
__global__
void kernelFunction()
{
   printf("NumElements = %d
", numElements);
}
$ cat test.cu
#define TEST_CU
#include "common.cuh"

int main()
{
   int N = 100;
   cudaMemcpyToSymbol(numElements,&N,sizeof(int));
   kernelFunction<<<1,1>>>();
   cudaDeviceSynchronize();
   return 0;
}

$ nvcc -arch=sm_20 -rdc=true -o test common.cu test.cu
$ ./test
NumElements = 100
$