且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

OpenMP的慢多个线程,想不通

更新时间:2023-02-12 14:37:46

首先,你的并行区域重新启动的外循环的每个迭代,从而增加了的开销。

First, your parallel region is restarted on each iteration of the outer loop, thus adding a huge overhead.

二,线程的一半将只是坐在那里什么都不做,因为你的块大小为两倍大,因为它应该是 - 它是 NX /确定nthreads 而数并行循环的迭代是 NX / 2 ,因此有(NX / 2)/(NX /确定nthreads)=确定nthreads / 2 总块。再说你已经尝试实现的是复制时间表的行为(静态)

Second, half of the threads would be just sitting there doing nothing since your chunk size is twice as bigger as it should be - it is nx/nthreads while the number of iterations of the parallel loop is nx/2, hence there are (nx/2)/(nx/nthreads) = nthreads/2 chunks in total. Besides what you have tried to achieve is to replicate the behaviour of schedule(static).

#pragma omp parallel
for (int t = 0; t < n; t++) {
   #pragma omp for schedule(static) 
   for (int i = 1; i < nx/2+1; i++) {
      for (int j = 1; j < nx-1; j++) {
         T_c[i][j] = 0.25*(T_p[i-1][j]+T_p[i+1][j]+T_p[i][j-1]+T_p[i][j+1]);
         T_c[nx-i-1][j] = T_c[i][j];
      }
   }
   #pragma omp single
   copyT(T_p, T_c, nx);
}
print2file(T_c, nx, file);

如果您修改 copyT 也使用并行,那么结构应该被删除。你不需要默认(共享),因为这是默认的。你不申报并行循环的循环变量私人 - 即使这个变量来自一个外部范围(因此该地区是隐式共享),OpenMP的自动使它私有的。简单地声明所有循环变量在循环控制,并将其与应用的默认共享规则自动地工作。

If you modify copyT to also use parallel for, then the single construct should be removed. You do not need default(shared) as this is the default. You do not to declare the loop variable of a parallel loop private - even if this variable comes from an outer scope (and hence is implicitly shared in the region), OpenMP automatically makes it private. Simply declare all loop variables in the loop controls and it works automagically with the default sharing rules applied.

二半,有(可能)在你的内循环的错误。第二assingment说法应为:

Second and a half, there is (probably) an error in your inner loop. The second assingment statement should read:

T_c[nx-i-1][j] = T_c[i][j];

(或 T_C [NX-I] [J] 如果你不保持一个光环下侧),否则当 I 等于 1 ,那么你将要访问 T_C [NX] [...] 这是的 T_C 的范围之外。

(or T_c[nx-i][j] if you do not keep a halo on the lower side) otherwise when i equals 1, then you would be accessing T_c[nx][...] which is outside the bounds of T_c.

三,一般提示:而不是复制一个阵列到另一个,使用指针到这些阵列,只是交换两个指针在每次迭代结束

Third, a general hint: instead of copying one array into another, use pointers to those arrays and just swap the two pointers at the end of each iteration.