且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

CUDA编程 - L1和L2高速缓存

更新时间:2022-06-05 22:42:04

通常,您将保留启用L1和L2缓存。您应该尽可能多地合并您的内存访问,即warp内的线程应尽可能访问同一128B段内的数据(请参阅 CUDA编程指南)。

Typically you would leave both L1 and L2 caches enabled. You should try to coalesce your memory accesses as much as possible, i.e. threads within a warp should access data within the same 128B segment as much as possible (see the CUDA Programming Guide for more info on this topic).

某些程序无法优化这种方式,它们的存储器访问例如是完全随机的。对于这些情况,旁路L1缓存可能是有益的,从而避免加载整个128B线,当你只想要,例如,4个字节(你仍然会加载32B,因为那是最小)。显然有效率增益:来自128的4个有用字节从32提高到4。

Some programs are unable to be optimised in this manner, their memory accesses are completely random for example. For those cases it may be beneficial to bypass the L1 cache, thereby avoiding loading an entire 128B line when you only want, for example, 4 bytes (you'll still load 32B since that is the minimum). Clearly there is an efficiency gain: 4 useful bytes from 128 is improved to 4 from 32.