且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

缓存友好的方法来将两个矩阵相乘

更新时间:2022-10-31 20:50:06

您要查找的单词是 th撞.在Google 产生更多结果.

The word you are looking for is thrashing. Searching for thrashing matrix multiplication in Google yields more results.

用于c = a * b的标准乘法算法看起来像

A standard multiplication algorithm for c = a*b would look like

void multiply(double[,] a, double[,] b, double[,] c)
{
    for (int i = 0; i < n; i++)
        for (int j = 0; j < n; j++)
            for (int k = 0; k < n; k++)
                C[i, j] += a[i, k] * b[k, j]; 
}

基本上,快速大步浏览内存会对性能产生不利影响. B [ k ,j]中 k 的访问模式正是如此.因此,我们可以重新安排这些操作,以使最内部的循环仅对矩阵的第二个访问索引进行操作,而不是在内存中跳来跳去:

Basically, navigating the memory fastly in large steps is detrimental to performance. The access pattern for k in B[k, j] is doing exactly that. So instead of jumping around in the memory, we may rearrange the operations such that the most inner loops operate only on the second access index of the matrices:

void multiply(double[,] a, double[,] B, double[,] c)
{  
   for (i = 0; i < n; i++)
   {  
      double t = a[i, 0];
      for (int j = 0; j < n; j++)
         c[i, j] = t * b[0, j];

      for (int k = 1; k < n; k++)
      {
         double s = 0;
         for (int j = 0; j < n; j++ )
            s += a[i, k] * b[k, j];
         c[i, j] = s;
      }
   }
}

这是该页面上给出的示例.但是,另一种选择是将B [k,*]的内容事先复制到一个数组中,并在内部循环计算中使用该数组.即使涉及到复制数据,这种方法通常比其他方法快得多.即使这似乎违反直觉,也请随时尝试一下.

This was the example given on that page. However, another option is to copy the contents of B[k, *] into an array beforehand and use this array in the inner loop calculations. This approach is usually much faster than the alternatives, even if it involves copying data around. Even if this might seem counter-intuitive, please feel free to try for yourself.

void multiply(double[,] a, double[,] b, double[,] c)
{
    double[] Bcolj = new double[n];
    for (int j = 0; j < n; j++)
    {
        for (int k = 0; k < n; k++)
            Bcolj[k] = b[k, j];

        for (int i = 0; i < n; i++)
        {
            double s = 0;
            for (int k = 0; k < n; k++)
                s += a[i,k] * Bcolj[k];
            c[j, i] = s;
        }
   }
}