且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

存储指令是否在高速缓存未命中时阻止后续指令?

更新时间:2023-02-14 21:22:49

通常来说,对于随后的代码不会很快读取的存储,该存储不会直接 延迟随后的代码任何现代的乱序处理器,包括英特尔.

Generally speaking, for a store that is not soon read by subsequent code, the store doesn't directly delay that subsequent code on any modern out-of-order processor, including Intel.

例如:

foo()
*x = y;
bar()

如果 foo()没有修改 x y ,并且 bar 不能从 * x ,存储是独立的,甚至可能在 foo()完成之前(甚至在启动之前)和 bar()开始执行>可能在存储提交到缓存之前执行,并且 bar()甚至可能在 foo()运行时执行,等等.

If foo() doesn't modify x or y, and bar doesn't load from *x, the store is independent and may start executing even before foo() is complete (or even before it starts), and bar() may execute before the store commits to the cache, and bar() may even execute while foo() is running, etc.

虽然直接的影响很小,但这并不意味着没有间接的影响,实际上商店可能会主导执行时间.

While there is little direct impact, it doesn't meant there aren't indirect impacts and indeed the store may dominate the execution time.

如果存储未命中高速缓存,则在满足高速缓存未命中的情况下,它可能占用核心资源.通常,它还可以防止随后的存储耗尽,这可能是一个瓶颈:如果存储缓冲区已满,则前端将完全阻塞并且新指令将不再进入调度程序.

If the store misses in cache, it may tie up off-core resources while the cache miss is satisfied. It also usually prevent subsequent stores from draining, which may be a bottleneck: if the store buffer fills up, the front-end blocks entirely and new instructions no longer enter the scheduler.

最后,一切都像往常一样取决于周围代码的细节.如果该序列重复运行,并且 foo() bar()较短,则与存储相关的未命中可能会主导运行时.毕竟,缓冲不能掩盖无限数量的商店的成本.在某些时候,您会受到商店内在吞吐量的束缚.

Finally, everything depends on the details of the surrounding code, as usual. If that sequence is run repeatedly, and foo() and bar() are short, the misses related to the store may dominate the runtime. After all, buffering can't hide the cost of an unlimited number of stores. At some point you'll be bound by the intrinsic throughput of the stores.