且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

状态cuda异常后的内存数据

更新时间:2023-02-27 15:12:38



这种类型的错误是明显的,因为它是粘性,意味着一旦发生,每个CUDA API调用将返回



非粘性错误在由cuda API调用返回后会自动清除(除了 cudaPeekAtLastError )。任何崩溃的内核类型错误(无效访问,未指定的启动失败等)将是一个粘滞的错误。在您的示例中,步骤3将(总是)对从设备到主机传输变量A的 cudaMemcpy 调用的结果返回一个API错误,因此 cudaMemcpy 操作未定义和不可靠 - 就像 cudaMemcpy 操作也以一些未明确的方式失败。



由于损坏的CUDA上下文的行为是未定义的,因此没有任何分配的内容的定义,或者一般的错误后的机器的状态。



一个非粘性错误的例子可能是尝试 cudaMalloc 比设备内存中的更多的数据。这样的操作将返回内存不足错误,但该错误将在被返回后被清除,并且后续(有效)cuda API调用可以成功完成,而不返回错误。非粘性错误不会破坏CUDA上下文,并且cuda上下文的行为与从未请求过无效操作是完全一样的。



这种区别在许多记录的错误代码中调用了粘性和非粘性错误描述,例如:



非粘性,非cuda上下文破坏:


cudaErrorMemoryAllocation = 2
API调用失败,因为它无法分配足够的内存来执行请求的操作。


sticky,cuda-context-corrupting:


cudaErrorMisalignedAddress = 74
设备遇到负载或将指令存储在未对准的存储器地址上。上下文不能使用,所以它必须被销毁(并且应该创建一个新的)。来自此上下文的所有现有设备内存分配都是无效的,如果程序要继续使用CUDA,则必须重新构建。



CUDA document is not clear on how memory data changes after CUDA applications throws an exception.

For example, a kernel launch(dynamic) encountered an exception (e.g. Warp Out-of-range Address), current kernel launch will be stopped. After this point, will data (e.g. __device__ variables) on device still kept or they are removed along with the exceptions?

A concrete example would be like this:

  1. CPU launches a kernel
  2. The kernel updates the value of __device__ variableA to be 5 and then crashes
  3. CPU memcpy the value of variableA from device to host, what is the value the CPU gets in this case, 5 or something else?

Can someone show the rationale behind this?

The behavior is undefined in the event of a CUDA error which corrupts the CUDA context.

This type of error is evident because it is "sticky", meaning once it occurs, every single CUDA API call will return that error, until the context is destroyed.

Non-sticky errors are cleared automatically after they are returned by a cuda API call (with the exception of cudaPeekAtLastError). Any "crashed kernel" type error (invalid access, unspecified launch failure, etc.) will be a sticky error. In your example, step 3 would (always) return an API error on the result of the cudaMemcpy call to transfer variableA from device to host, so the results of the cudaMemcpy operation are undefined and unreliable -- it is as if the cudaMemcpy operation also failed in some unspecified way.

Since the behavior of a corrupted CUDA context is undefined, there is no definition for the contents of any allocations, or in general the state of the machine after such an error.

An example of a non-sticky error might be an attempt to cudaMalloc more data than is available in device memory. Such an operation will return an out-of-memory error, but that error will be cleared after being returned, and subsequent (valid) cuda API calls can complete successfully, without returning an error. A non-sticky error does not corrupt the CUDA context, and the behavior of the cuda context is exactly the same as if the invalid operation had never been requested.

This distinction between sticky and non-sticky error is called out in many of the documented error code descriptions, for example:

non-sticky, non-cuda-context-corrupting:

cudaErrorMemoryAllocation = 2 The API call failed because it was unable to allocate enough memory to perform the requested operation.

sticky, cuda-context-corrupting:

cudaErrorMisalignedAddress = 74 The device encountered a load or store instruction on a memory address which is not aligned. The context cannot be used, so it must be destroyed (and a new one should be created). All existing device memory allocations from this context are invalid and must be reconstructed if the program is to continue using CUDA.