且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

OpenCL内核"for-loop"中的变量会降低性能

更新时间:2022-05-25 22:04:42

是的,性能降低的最可能原因是编译器无法展开循环.您可以尝试一些方法来改善这种情况.

Yes, the most likely cause of the performance degradation is that the compiler can't unroll the loop. There's a few things you could try to improve the situation.

您可以将参数定义为通过程序构建选项传递的预处理器宏.这是一种常见的技巧,用于建立仅在运行时才作为编译时常数在内核中已知的值.例如:

You could define the parameter as a preprocessor macro passed via your program build options. This is a common trick used to build values that are only known at runtime into kernels as compile-time constants. For example:

clBuildProgram(program, 1, &device, "-Dnum_loops=50000", NULL, NULL);

您可以使用sprintf动态地构建构建选项,以使其更加灵活.显然,只有在不需要经常更改参数的情况下,这样做才是值得的,这样重新编译的开销就不会成为问题.

You could construct the build options dynamically using sprintf to make this more flexible. Clearly this will only be worth it if you don't need to change the parameter often, so that the overhead of recompilation doesn't become a problem.

您可以调查您的OpenCL平台是否使用了任何编译指示,这些编译指示可以为编译器提供有关循环展开的提示.例如,某些OpenCL编译器可以识别#pragma unroll(或类似名称). OpenCL 2.0为此具有一个属性:__attribute__((opencl_unroll_hint)).

You could investigate whether your OpenCL platform uses any pragmas that can give the compiler hints about loop-unrolling. For example, some OpenCL compilers recognise #pragma unroll (or similar). OpenCL 2.0 has an attribute for this: __attribute__((opencl_unroll_hint)).

您可以手动展开循环.它的外观取决于您可以对num_loops参数做出什么样的假设.例如,如果您知道(或可以确保)它将始终是4的倍数,则可以执行以下操作:

You could manually unroll the loop. How this would look depends on what assumptions you can make about the num_loops parameter. For example, if you know (or can ensure) that it will always be a multiple of 4, you could do something like this:

for (int kk = 0; kk < num_loops;)
{
  <... more code here ...>
  kk++;
  <... more code here ...>
  kk++;
  <... more code here ...>
  kk++;
  <... more code here ...>
  kk++;
}

即使您不能做出这样的假设,您仍然应该能够执行手动展开,但是这可能需要做一些额外的工作(例如,完成所有剩余的迭代).

Even if you can't make such assumptions, you should still be able to perform manual unrolling, but it may require some extra work (for example, to finish any remaining iterations).