为什么此编译器障碍不强制执行排序?

更新时间：2023-11-13 23:28:46

如果cli()MACRO扩展为显式破坏内存的嵌入式asm，这怎么可能?编译器如何***地在此语句之前或之后移动内容?

How come this is possible if the cli() MACRO expands to inline asm which explicitly clobbers the memory? How is the compiler free to move things before or after this statement?

这是由于avr-gcc的实现细节所致:编译器的支持库libgcc提供了许多以汇编形式编写的功能，以提高性能.包括用于整数除法的函数，例如__udivmodhi4.并非所有这些功能都会破坏avr指定的所有被调用者使用的寄存器-gcc ABI .特别是__udivmodhi4不会破坏Z寄存器.

This is due to implementation details of avr-gcc: The compiler's support library, libgcc, provides many functions written in assembly for performance; including functions for integer division like __udivmodhi4. Not all of these functions clobber all of the callee-used registers as specified by the avr-gcc ABI. In particular, __udivmodhi4 does not clobber the Z register.

avr-gcc的用法如下:在没有16位除法指令的机器(如AVR)上，GCC会发出库调用而不是为它内联生成代码.但是，avr-gcc假装该体系结构确实具有这种划分指令，并将其建模为对处理器寄存器具有影响，就像库调用一样.最后，在所有代码分析和优化之后，avr后端将此指令打印为[R]CALL __udivmodhi4.我们称其为 透明调用 ，即编译器分析看不到的调用.

avr-gcc makes use of this as follows: On machines without 16-bit division instruction like AVR, GCC would issue a library call instead of generating code for it inline. avr-gcc however pretends that the architecture does have such division instruction and models it as having an effect on processor registers just like the library call. Finally, after all code analyzes and optimizations, the avr backend prints this instruction as [R]CALL __udivmodhi4. Let's call this a transparent call, i.e. a call which the compiler analysis does not see.

示例

int div (int a, int b, volatile const __flash char *z)
{
    int ab;

    (void) *z;
    asm volatile ("" : "+r" (a));
    ab = a / b;
    asm volatile ("" : "+r" (ab));
    (void) *z;

    return ab;
}

使用avr-gcc -S -Os -mmcu=atmega8 ...进行编译以获取程序集文件*.s:

Compile this with avr-gcc -S -Os -mmcu=atmega8 ... to get assembly file *.s:

div:
    movw  r30,r20
    lpm   r18,Z
    rcall __divmodhi4
    movw  r24,r22
    lpm   r18,Z
    ret

说明

(void) *z从闪存读取一个字节，并且为了使用lpm指令，该地址必须位于由movw r30,r20完成的Z寄存器中.通过lpm读取后，编译器发出rcall __divmodhi4以执行带符号的16位除法.如果这是一个普通的(非透明的)调用，则编译器将不了解被调用方的内部工作，但是由于avr后端通过手工对调用进行建模，因此编译器知道指令序列不会改变Z ，因此可以在通话后再次使用Z，而无需多费周折.由于较少的寄存器压力，因此可以更好地生成代码，尤其是z不需要在分区周围保存/恢复.

(void) *z reads one byte from flash, and in order to use lpm instruction, the address must be in the Z register accomplished by movw r30,r20. After reading via lpm, the compiler issues rcall __divmodhi4 to perform signed 16-bit division. If this was an ordinary (non-transparent) call, the compiler would know nothing about the internal working of the callee, but as the avr backend models the call by hand, the compiler knows that the instruction sequence does not change Z and hence may use Z again after the call without any further ado. This allows for better code generation due to less register pressure, in particular z need not be saved / restores around the division.

asm仅用于订购代码:它是易失性的，因此不能根据易失性读取*z重新排序.并且asm不得针对除法进行重新排序，因为asm会更改a和ab –至少这就是我们假装并通过约束告诉编译器的内容. (这些变量实际上并没有更改，但这在这里无关紧要.)

The asm just serves to order the code: It is volatile and hence must not be reordered against the volatile read *z. And the asm must not be reordered against the division because the asm changes a and ab – at least that's what we are pretending and telling the compiler by means of the constraints. (These variables are not actually changed, but that does not matter here.)

我还修改了代码，以__asm volatile("" ::: "memory");的形式在每条语句之前添加了内存屏障，并且似乎没有任何改变.

Also, I modified the code to include memory barriers before every statement in the form of __asm volatile("" ::: "memory"); and it doesn't seem to change anything.

该部分不涉及内存(这是一个没有内存垃圾的透明调用)，因此编译器机制可能会针对内存垃圾/访问对它进行重新排序.

The division does not touch memory (it's a transparent call without memory clobber) hence the compiler machinery may reorder it against memory clobber / accesses.

如果您需要特定的订单，则必须像上面的示例中那样引入人为的依赖关系.

If you need a specific order, then you'll have to introduce artificial dependencies like in in my example above.

为了区分普通调用和透明调用，您可以通过-save-temps -dp将生成的程序集转储到.s文件中，其中-dp打印insn名称:

In order to tell apart ordinary calls from transparent ones, you can dump the generated assembly in the .s file be means of -save-temps -dp where -dp prints insn names:

void func0 (void);

int func1 (int a, int b)
{
    return a / b;
}

void func2 (void)
{
    func0();
}

每个既不是call_insn也不是call_value_insn的呼叫都是透明呼叫，在这种情况下为*divmodhi4_call:

Every call that's neither call_insn nor call_value_insn is a transparent call, *divmodhi4_call in this case:

func1:
    rcall __divmodhi4    ;  17  [c=0 l=1]  *divmodhi4_call
    movw r24,r22         ;  18  [c=4 l=1]  *movhi/0
    ret                  ;  23  [c=0 l=1]  return

func2:
    rjmp func0           ;  5   [c=0 l=1]  call_insn/3

上一篇 : ：为什么使用互斥体而不是布尔变量进行线程同步?下一篇 : Pyspark 替换 Spark 数据框列中的字符串

为什么此编译器障碍不强制执行排序?

相关阅读

推荐文章