且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

std::min 与使用 #pragma GCC 优化(“O3")的三元 gcc 自动矢量化

更新时间:2021-09-26 03:23:38

总结:不要使用 #pragma GCC optimize.在命令行上改用 -O3,你会得到你期望的行为.

Summary: don't use #pragma GCC optimize. Use -O3 on the command line instead, and you'll get the behavior you expect.

GCC 的文档#pragma GCC optimize 上说:

GCC's documentation on #pragma GCC optimize says:

在这一点之后定义的每个函数都被视为已经为每个字符串参数声明了一个 optimize(string) 属性.

Each function that is defined after this point is treated as if it had been declared with one optimize(string) attribute for each string argument.

以及优化 属性记录为:

And the optimize attribute is documented as:

优化属性用于指定要使用与命令行中指定的优化选项不同的优化选项来编译函数.[...] 优化属性应仅用于调试目的.它不适合在生产代码中使用. [强调,感谢 Peter Cordes 发现最后一部分.]

The optimize attribute is used to specify that a function is to be compiled with different optimization options than specified on the command line. [...] The optimize attribute should be used for debugging purposes only. It is not suitable in production code. [Emphasis added, thanks Peter Cordes for spotting the last part.]

所以,不要使用它.

特别是,在文件顶部指定 #pragma GCC optimize ("O3") 实际上并不等同于在文件顶部使用 -O3命令行.事实证明,前者不会导致 std::min 被内联,因此编译器实际上确实假设它可能会修改全局内存,例如您的 a,b代码>数组.这自然会抑制矢量化.

In particular, it looks like specifying #pragma GCC optimize ("O3") at the top of your file is not actually equivalent to using -O3 on the command line. It turns out that the former doesn't result in std::min being inlined, and so the compiler actually does assume that it might modify global memory, such as your a,b arrays. This naturally inhibits vectorization.

仔细阅读__attribute__((optimize)) 的文档,它看起来像每个函数main()std::min() 将像 -O3 一样被编译.但这与使用 -O3 将它们编译在一起不同,因为只有在后一种情况下,程序间优化(如内联)才可用.

A careful reading of the documentation for __attribute__((optimize)) makes it look like each of the functions main() and std::min() will be compiled as if with -O3. But that's not the same as compiling the two of them together with -O3, as only in the latter case would interprocedural optimizations like inlining be available.

这是一个关于 Godbolt 的非常简单的例子.使用 #pragma GCC optimize ("O3") 函数 foo()please_inline_me() 都被优化了,但是 please_inline_me() 不会被内联.但是在命令行上使用 -O3 就可以了.

Here is a very simple example on godbolt. With #pragma GCC optimize ("O3") the functions foo() and please_inline_me() are each optimized, but please_inline_me() does not get inlined. But with -O3 on the command line, it does.

猜测是 optimize 属性和扩展 #pragma GCC optimize 会导致编译器将该函数视为其定义在单独的源中正在使用指定选项编译的文件.事实上,如果 std::min()main() 在单独的源文件中定义,你可以用 -O3 编译每个文件但你不会内联.

A guess would be that the optimize attribute, and by extension #pragma GCC optimize, causes the compiler to treat the function as if its definition were in a separate source file which was being compiled with the specified option. And indeed, if std::min() and main() were defined in separate source files, you could compile each one with -O3 but you wouldn't get inlining.

可以说 GCC 手册应该更明确地记录这一点,但我想如果它只是为了调试,假设它是为熟悉这种区别的专家准备的可能是公平的.

Arguably the GCC manual should document this more explicitly, though I guess if it's only meant for debugging, it might be fair to assume it's intended for experts who would be familiar with the distinction.

如果你真的在命令行上用 -O3 编译你的例子,你会得到两个版本相同的(矢量化)程序集,或者至少我做到了.(修正向后比较后:您的三元代码正在计算最大值而不是最小值.)

If you really do compile your example with -O3 on the command line, you get identical (vectorized) assembly for both versions, or at least I did. (After fixing the backwards comparison: your ternary code is computing max instead of min.)