且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

优化SIMD直方图计算

更新时间:2021-07-23 21:33:49

像Jester一样,我很惊讶您的SIMD代码有了重大改进.您是否在启用优化的情况下编译了C代码?

Like Jester I'm surprised that your SIMD code had any significant improvement. Did you compile the C code with optimization turned on?

我可以提出的另一项建议是展开您的Packetloop循环.这是一个相当简单的优化,并且将每个迭代"的指令数量减少到只有两个:

The one additional suggestion I can make is to unroll your Packetloop loop. This is a fairly simple optimization and reduces the number of instructions per "iteration" to just two:

pextrb  ebx, xmm0, 0
inc dword [ebx * 4 + Hist]
pextrb  ebx, xmm0, 1
inc dword [ebx * 4 + Hist]
pextrb  ebx, xmm0, 2
inc dword [ebx * 4 + Hist]
...
pextrb  ebx, xmm0, 15
inc dword [ebx * 4 + Hist]

如果您使用的是NASM,则可以使用%rep指令保存一些输入内容:

If you're using NASM you can use the %rep directive to save some typing:

%assign pixel 0
%rep 16
    pextrb  rbx, xmm0, pixel
    inc dword [rbx * 4 + Hist]
%assign pixel pixel + 1
%endrep