CUDA中的非平方矩阵乘法

更新时间：2021-10-13 21:33:47

我认为最简单的方法是在块的末尾填充零:

I think the easiest thing to do would be to just pad the blocks on the end with zeros:

for(int m=0; m< uWM/blocksize;++m){
    colM = m*blocksize+tx;
    rowN = m*blocksize+ty;
    if (rowM > uWN || rowN > uWM || colM > uWM || colN > uWN) {
        MS[ty][tx]=0.;
        NS[ty][tx]=0.;
    } else {
        MS[ty][tx]=M[rowM*uWM+colM];
        NS[ty][tx]=N[colN + uWN*rowN];
    }

正负. (那条NS行应该引用N，而不是M，对吧?)

plus or minus. (That NS line should reference N, not M, right?)

但是，由于我似乎是唯一一个在可能的情况下主张使用现有调优库的人-为什么不使用 MAGMA 而不是自己滚动?它们速度很快，并经过数百名用户的测试.

But, since I seem to be the only one here advocating using existing tuned libraries when possible -- why not use CUBLAS or MAGMA instead of rolling your own? They're fast, and tested by hundreds of users.

上一篇 : ：致命错误：sqlite3.h：没有这样的文件或目录下一篇 : cuda.jit矩阵乘法崩溃

CUDA中的非平方矩阵乘法

相关阅读

技术问答最新文章