且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

分支分歧真的那么糟糕吗?

更新时间:2023-10-06 13:25:10

你假设(至少这是你给出的例子和你做的唯一参考)避免分支分歧的唯一方法是允许所有线程执行所有代码.

You're assuming (at least it's the example you give and the only reference you make) that the only way to avoid branch divergence is to allow all threads to execute all the code.

在这种情况下,我同意没有太大区别.

In that case I agree there's not much difference.

但避免分支分歧可能更多地与更高级别的算法重组有关,而不仅仅是添加或删除一些 if 语句并使代码安全"地在所有线程中执行.

But avoiding branch divergence probably has more to do with algorithm re-structuring at a higher level than just the addition or removal of some if statements and making code "safe" to execute in all threads.

我将提供一个例子.假设我知道奇数线程需要处理像素的蓝色分量,偶数线程需要处理绿色分量:

I'll offer up one example. Suppose I know that odd threads will need to handle the blue component of a pixel and even threads will need to handle the green component:

#define N 2 // number of pixel components
#define BLUE 0
#define GREEN 1
// pixel order: px0BL px0GR px1BL px1GR ...


if (threadIdx.x & 1)  foo(pixel(N*threadIdx.x+BLUE));
else                  bar(pixel(N*threadIdx.x+GREEN));

这意味着每个备用线程都采用给定的路径,无论是 foo 还是 bar.所以现在我的扭曲需要两倍的时间来执行.

This means that every alternate thread is taking a given path, whether it be foo or bar. So now my warp takes twice as long to execute.

但是,如果我重新排列像素数据,以使颜色分量可能以 32 像素的块连续:BL0 BL1 BL2 ... GR0 GR1 GR2 ...

However, if I rearrange my pixel data so that the color components are contiguous perhaps in chunks of 32 pixels: BL0 BL1 BL2 ... GR0 GR1 GR2 ...

我可以写类似的代码:

if (threadIdx.x & 32)  foo(pixel(threadIdx.x));
else                   bar(pixel(threadIdx.x));

看起来我仍然有分歧的可能性.但是由于分歧发生在扭曲边界上,给定扭曲执行 if 路径或 else 路径,因此不会发生实际的分歧.

It still looks like I have the possibility for divergence. But since the divergence happens on warp boundaries, a give warp executes either the if path or the else path, so no actual divergence occurs.

这是一个微不足道的例子,可能很愚蠢,但它说明了可能有一些方法可以解决扭曲分歧,而不涉及运行所有分歧路径的所有代码.

This is a trivial example, and probably stupid, but it illustrates that there may be ways to work around warp divergence that don't involve running all the code of all the divergent paths.