且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

从非空函数的末尾掉下来时写入未使用的参数的返回值

更新时间:2023-11-12 23:53:10

gcc -O0 喜欢计算返回值寄存器中的表达式,if a完全需要寄存器.(GCC -O0 通常只是喜欢在 retval 寄存器中有值,但这不仅仅是选择它作为第一个临时值.)

我进行了一些测试,看起来 GCC -O0 确实是故意跨多个 ISA 执行此操作,有时甚至使用额外的 mov 指令或等效指令.IIRC 我做了一个更复杂的表达式,所以计算结果在另一个寄存器中结束,但它仍然将它复制回 retval 寄存器.

x++ 这样可以(在 x86 上)编译到内存目标 inc 或 add 的东西不会将值留在寄存器中,但赋值通常会.所以值得注意的是 GCC 正在处理像 GNU C 语句表达式一>.


没有被任何文件记录、保证或标准化.这是一个实现细节,而不是让您像这样利用的东西.

回归"这种方式的值意味着您正在使用GCC -O0"而不是 C 进行编程. 代码高尔夫规则的措辞表明程序必须在至少一种实现上工作.但我的理解是,它们应该出于正确的原因而工作,而不是因为某些副作用实现细节.它们在 clang 上失败并不是因为 clang 不支持某些语言功能,只是因为它们甚至不是用 C 编写的.

打破优化也并不酷;某种程度的 UB 在代码高尔夫中通常是可以接受的,例如整数环绕或指针转换类型双关语是人们可能合理希望得到明确定义的东西.但这纯粹是滥用一个编译器的实现细节,而不是语言特性.

我在 Codegolf 上的相关答案下的评论中论证了这一点.SE C 高尔夫技巧问答(错误地声称它在 GCC 之外有效).该答案有 4 票反对(值得更多 IMO),但有 16 票赞成.所以社区的一些成员不同意这是可怕和愚蠢的.


有趣的事实:在 ISO C++(但不是 C)中,执行在非void 函数的末尾是未定义行为,即使调用者没有't 使用结果.即使在 GNU C++ 中也是如此;在 -O0 之外 GCC 和 clang 有时会发出类似 ud2(非法指令)的代码,用于到达函数末尾而没有 return代码>.所以 GCC 通常不会在这里定义行为(对于 ISO C 和 C++ 未定义的事情,允许哪些实现做.例如 gcc -fwrapv 将有符号溢出定义为 2 的补码环绕.)>

但是在 ISO C 中,从非 void 函数的末尾脱落是合法的:只有在调用者使用返回值时它才会变成 UB.没有 -Wall GCC 甚至可能不会发出警告.检查没有返回语句的函数的返回值

禁用优化后,函数内联不会发生,因此 UB 在编译时并不真正可见.(除非你使用 __attribute__((always_inline))).


传递第二个参数只会给你一些赋值.它是一个函数 arg 并不重要.但是 i=i; 即使使用 -O0 也会优化掉,所以你确实需要一个单独的变量.也只是 i; 优化掉了.

有趣的事实:递归 f(i){ f(i);} 函数体在将 i 复制到第一个 arg-passing 寄存器之前通过 EAX 反弹.所以 GCC 真的很喜欢 EAX.

 movl -4(%rbp), %eaxmovl %eax, %edimovl $0, %eax # 没有完整的原型,在 AL 中传递 # FP args呼叫 f

i++; 没有加载到 EAX 中;它只使用内存目标 add 而不加载到寄存器中.值得尝试使用 gcc -O0 for ARM.

In this golfing answer I saw a trick where the return value is the second parameter which is not passed in.

int f(i, j) 
{
    j = i;   
}

int main() 
{
    return f(3);
}

From gcc's assembly output it looks like when the code copies j = i it stores the result in eax which happens to be the return value.

f:
        pushq   %rbp
        movq    %rsp, %rbp
        movl    %edi, -4(%rbp)
        movl    %esi, -8(%rbp)
        movl    -4(%rbp), %eax
        movl    %eax, -8(%rbp)
        nop
        popq    %rbp
        ret
main:
        pushq   %rbp
        movq    %rsp, %rbp
        movl    $3, %edi
        movl    $0, %eax
        call    f
        popq    %rbp
        ret 

So, did this happen just by being lucky? Is this documented by gcc? It only works with -O0, but it works with a bunch of values of i I tried, -m32, and a bunch of different versions of GCC.

gcc -O0 likes to evaluate expressions in the return-value register, if a register is needed at all. (GCC -O0 generally just likes to have values in the retval register, but this goes beyond picking that as the first temporary.)

I've tested a bit, and it really looks like GCC -O0 does this on purpose across multiple ISAs, sometimes even using an extra mov instruction or equivalent. IIRC I made an expression more complicated so the result of evaluation ended up in another register, but it still copied it back to the retval register.

Things like x++ that can (on x86) compile to a memory-destination inc or add won't leave the value in a register, but assignments typically will. So it's note quite like GCC is treating function bodies like GNU C statement-expressions.


This is not documented, guaranteed, or standardized by anything. It's an implementation detail, not something intended for you to take advantage of like this.

"Returning" a value this way means you're programming in "GCC -O0", not C. The wording of the code-golf rules says that programs have to work on at least one implementation. But my reading of that is that they should work for the right reasons, not because of some side-effect implementation detail. They break on clang not because clang doesn't support some language feature, just because they're not even written in C.

Breaking with optimization enabled is also not cool; some level of UB is generally acceptable in code golf, like integer wraparound or pointer-casting type punning being things that one might reasonably wish were well-defined. But this is pure abuse of an implementation detail of one compiler, not a language feature.

I argued this point in comments under the relevant answer on Codegolf.SE C golfing tips Q&A (Which incorrectly claims it works beyond GCC). That answer has 4 downvotes (and deserves more IMO), but 16 upvotes. So some members of the community disagree that this is terrible and silly.


Fun fact: in ISO C++ (but not C), having execution fall off the end of a non-void function is Undefined Behaviour, even if the caller doesn't use the result. This is true even in GNU C++; outside of -O0 GCC and clang will sometimes emit code like ud2 (illegal instruction) for a path of execution that reaches the end of a function without a return. So GCC doesn't in general define the behaviour here (which implementations are allowed to do for things that ISO C and C++ leaves undefined. e.g. gcc -fwrapv defines signed overflow as 2's complement wraparound.)

But in ISO C, it's legal to fall off the end of a non-void function: it only becomes UB if the caller uses the return value. Without -Wall GCC may not even warn. Checking return value of a function without return statement

With optimization disabled, function inlining won't happen so the UB isn't really compile-time visible. (Unless you use __attribute__((always_inline))).


Passing a 2nd arg merely gives you something to assign to. It's not important that it's a function arg. But i=i; optimizes away even with -O0 so you do need a separate variable. Also just i; optimizes away.

Fun fact: a recursive f(i){ f(i); } function body does bounce i through EAX before copying it to the first arg-passing register. So GCC just really loves EAX.

        movl    -4(%rbp), %eax
        movl    %eax, %edi
        movl    $0, %eax             # without a full prototype, pass # of FP args in AL
        call    f

i++; doesn't load into EAX; it just uses a memory-destination add without loading into a register. Worth trying with gcc -O0 for ARM.