更新时间:2022-05-28 23:57:50
要从块的部分结果中计算出最终总和,我建议采用以下方式:
To compute a final sum out of partial results of your blocks, I would suggest doing it the following way:
我假设每个块都有很多要独立计算的空间,这将首先保证使用CUDA.
I assume each block has a lot to compute on its own, which would warrant the usage of CUDA in the first place.
在您当前的状态下---我认为您的内核可能有问题.在我看来,每个块都在对所有数据求和,并返回最终结果,就好像它是部分结果一样.
In your current state --- I think there can be something wrong in your kernel. Seems to me that every block is summing all the data, returning a final result as if it was a partial result.
您介绍的循环实际上没有任何意义.对于每个块,只有一个 i
可以执行某项操作.该代码等同于简单地编写:
The loop you presented does not really make sense. For each block there is only one i
which will do something. The code will be equivalent to simply writing:
currentErrors[threadIdx.x]=0;
currentErrors[threadIdx.x]+=globalError(mynet,myoutput);
保存一些不可预测的计划差异.
save for some unpredictable scheduling differences.
请记住,不块是同步执行的.每个块都可以在任何其他块之前,之中或之后运行.
Remember that blocks are not executed in sync. Each block can run before, during or after any other block.
也: