更新时间:2023-09-11 21:53:46
我不认为RIDL攻击涉及RS的负载重播.因此,除了解释什么是负载重播(@Peter的回答是一个很好的起点),我将基于对RIDL论文中提供的信息(英特尔的
I don't think load replays from the RS are involved in the RIDL attacks. So instead of explaining what load replays are (@Peter's answer is a good starting point for that), I'll discuss what I think is happening based on my understanding of the information provided in the RIDL paper, Intel's analysis of these vulnerabilities, and relevant patents.
行填充缓冲区是L1D高速缓存中的硬件结构,用于保存在高速缓存中丢失的内存请求和I/O请求,直到它们得到服务为止.当所需的缓存行填充到L1D数据数组中时,可缓存的请求得到服务.当出现用于退出写合并缓冲区的任何条件时(如手册中所述),将对写合并写提供服务.将UC或I/O请求发送到L2高速缓存时,该请求会得到服务(这会尽快发生).
Line fill buffers are hardware structures in the L1D cache used to hold memory requests that miss in the cache and I/O requests until they get serviced. A cacheable request is serviced when the required cache line is filled into the L1D data array. A write-combining write is serviced when the any of the conditions for evicting a write-combining buffer occur (as described in the manual). A UC or I/O request is serviced when it is sent to the L2 cache (which occurs as soon as possible).
请参阅RIDL的图4 纸张.用于产生这些结果的实验如下:
Refer to Figure 4 of the RIDL paper. The experiment used to produce these results works as follows:
MFENCE
,并且有一个可选的CLFLUSH
.从纸上来说,我不清楚CLFLUSH
相对于其他两个指令的顺序,但这可能无关紧要. MFENCE
序列化高速缓存行刷新操作,以查看在高速缓存中每个负载未命中时会发生什么.此外,MFENCE
减少了L1D端口上两个逻辑核心之间的争用,从而提高了攻击者的吞吐量.MFENCE
and there is an optional CLFLUSH
. It's not clear to me from the paper the order of CLFLUSH
with respect to the other two instructions, but it probably doesn't matter. MFENCE
serializes the cache line flushing operation to see what happens when every load misses in the cache. In addition, MFENCE
reduces contention between the two logical cores on the L1D ports, which improves the throughput of the attacker.我不清楚图4中的Y轴代表什么.我的理解是,它表示每秒从隐式通道提取到高速缓存层次结构(第10行)中的行数,其中数组中行的索引等于受害者所写的值.
It's not clear to me what the Y-axis in Figure 4 represents. My understanding is that it represents the number of lines from the covert channel that got fetched into the cache hierarchy (Line 10) per second, where the index of the line in the array is equal to the value written by the victim.
如果该存储位置是WB类型,则当受害者线程将已知值写入该存储位置时,该行将被填充到L1D高速缓存中.如果该存储位置是WT类型,则当受害者线程将已知值写入该存储位置时,该行将不会填充到L1D高速缓存中.但是,在第一次从该行读取时,它将被填充.因此,在两种情况下,如果没有CLFLUSH
,受害者线程的大部分负载都将进入高速缓存.
If the memory location is of the WB type, when the victim thread writes the known value to the memory location, the line will be filled into the L1D cache. If the memory location is of the WT type, when the victim thread writes the known value to the memory location, the line will not be filled into the L1D cache. However, on the first read from the line, it will be filled. So in both cases and without CLFLUSH
, most loads from the victim thread will hit in the cache.
当装入请求的高速缓存行到达L1D高速缓存时,它将首先写入为请求分配的LFB中.可以将高速缓存行的请求部分从LFB直接提供给加载缓冲区,而不必等待该行被填充到高速缓存中.根据对MFBDS漏洞的描述,在某些情况下,先前请求中的陈旧数据可能会转发到加载缓冲区,以满足加载uop的要求.在WB和WT情况下(不刷新),受害者的数据最多被写入2个不同的LFB中.从攻击者线程走来的页面很容易覆盖LFB中的受害者数据,此后攻击者线程将永远无法在其中找到数据. L1D高速缓存中命中的所有负载请求都不会通过LFB.它们有一条单独的路径,与来自LFB的路径复用.但是,在某些情况下,可能会将来自LFB的陈旧数据(噪声)以推测方式转发到攻击者的逻辑核心,而这可能是来自页面遍历(以及中断处理程序和硬件预取器)的.
When the cache line for a load request reaches the L1D cache, it gets written first in the LFB allocated for the request. The requested portion of the cache line can be directly supplied to the load buffer from the LFB without having to wait for the line to be filled in the cache. According to the description of the MFBDS vulnerability, under certain situations, stale data from previous requests may be forwarded to the load buffer to satisfy a load uop. In the WB and WT cases (without flushing), the victim's data is written into at most 2 different LFBs. The page walks from the attacker thread can easily overwrite the victim's data in the LFBs, after which the data will never be found in there by the attacker thread. All load requests that hit in the L1D cache don't go through the LFBs; there is a separate path for them, which is multiplexed with the path from the LFBs. Nonetheless, there are some cases where stale data (noise) from the LFBs is being speculatively forwarded to the attacker's logical core, which is probably from the page walks (and maybe interrupt handlers and hardware prefetchers).
有趣的是,在WB和WT情况下,过时的数据转发频率远低于所有其他情况.在这种情况下,受害人的吞吐率会更高,并且实验可能会更早终止.
It's interesting to note that the frequency of stale data forwarding in the WB and WT cases is much lower than in all of the other cases. This is could be explained by the fact that the victim's throughput is much higher in these cases and the experiment may terminate earlier.
在所有其他情况下(WC,UC和所有带有刷新的类型),缓存中的每个负载都会丢失,并且必须通过LFB将数据从主内存中提取到负载缓冲区.发生以下事件顺序:
In all other cases (WC, UC, and all types with flushing), every load misses in the cache and the data has to be fetched from main memory to the load buffer through the LFBs. The following sequence of events occur:
MFENCE
,因此在任何给定周期内,LFB中最多有一个来自受害者的未完成负载.MFENCE
after every load, there can be at most one outstanding load in the LFB at any given cycle from the victim.如果攻击者的负载没有故障/没有得到协助,则LFB将从MMU收到有效物理地址,并执行所有必要的检查以确保正确性.这就是负载必须故障/辅助的原因.
If the attacker's load didn't fault/assist, the LFBs will receive a valid physical address from the MMU and all checks required for correctness are performed. That's why the load has to fault/assist.
本文的以下引文讨论了如何在同一线程中执行RIDL攻击:
The following quote from the paper discusses how to perform a RIDL attack in the same thread:
我们通过自己编写值来执行不带SMT的RIDL攻击 线程并观察我们从同一线程泄漏的值. 图3显示,如果我们不写值(没有受害者"),则会泄漏 只有零,但受害者和攻击者在同一硬件上运行 线程(例如,在沙盒中),我们几乎在所有 案例.
we perform the RIDL attack without SMT by writing values in our own thread and observing the values that we leak from the same thread. Figure3 shows that if we do not write the values ("no victim"), we leak only zeros, but with victim and attacker running in the same hardware thread (e.g., in a sandbox), we leak the secret value in almost all cases.
我认为此实验中没有特权级别更改.受害者和攻击者在同一硬件线程上的同一OS线程中运行.从受害人返回攻击者时,LFB中可能仍存在来自(尤其是来自商店)的一些未完成的请求.请注意,在RIDL论文中,所有实验均启用了KPTI(与Fallout论文相反).
I think there are no privilege level changes in this experiment. The victim and the attacker run in the same OS thread on the same hardware thread. When returning from the victim to the attacker, there may still be some outstanding requests in the LFBs from (especially from stores). Note that in the RIDL paper, KPTI is enabled in all experiments (in contrast to the Fallout paper).
除了从LFB泄漏数据外,MLPDS还显示数据也可以从加载端口缓冲区泄漏.这些包括行分割缓冲区和用于大于8个字节大小的负载的缓冲区(我认为,当负载uop的大小大于负载端口的大小时,例如,SnB/IvB上的AVX 256b,我认为这是必需的占用端口2个周期.)
In addition to leaking data from LFBs, MLPDS shows that data can also be leaked from the load port buffers. These include the line-split buffers and the buffers used for loads larger than 8 bytes in size (which I think are needed when the size of the load uop is larger than the size of the load port, e.g., AVX 256b on SnB/IvB that occupy the port for 2 cycles).
图5中的WB情况(无冲洗)也很有趣.在此实验中,受害线程将4个不同的值写入4个不同的缓存行,而不是从同一缓存行中读取.该图显示,在WB情况下,只有写入最后一个缓存行的数据才泄漏给攻击者.解释可能取决于高速缓存行在循环的不同迭代中是否不同,但不幸的是,在本文中并不清楚.文章说:
The WB case (no flushing) from Figure 5 is also interesting. In this experiment, the victim thread writes 4 different values to 4 different cache lines instead of reading from the same cache line. The figure shows that, in the WB case, only the data written to the last cache line is leaked to the attacker. The explanation may depend on whether the cache lines are different in different iterations of the loop, which is unfortunately not clear in the paper. The paper says:
对于不刷新的WB,仅最后一个缓存有一个信号 行,这表明CPU可以在一个单独的时间内执行写合并 在将数据存储到缓存中之前,先将LFB条目保存.
For WB without flushing, there is a signal only for the last cache line, which suggests that the CPU performs write combining in a single entry of the LFB before storing the data in the cache.
在将数据存储到高速缓存之前,如何将写入不同高速缓存行的写入合并到同一LFB中?这是零意义. LFB可以容纳单个高速缓存行和一个物理地址.只是不可能合并这样的写法.可能发生的情况是,WB写操作正在分配给为其RFO请求分配的LFB中.当无效的物理地址被发送到LFB进行比较时,数据可能总是从最后分配的LFB提供.这可以解释为什么只泄漏第四家商店写的值.
How can writes to different cache lines be combining in the same LFB before storing the data in the cache? That makes zero sense. An LFB can hold a single cache line and and a single physical address. It's just not possible to combine writes like that. What may be happening is that WB writes are being written in the LFBs allocated for their RFO requests. When the invalid physical address is transmitted to the LFBs for comparison, the data may always be provided from the LFB that was last allocated. This would explain why only the value written by the fourth store is leaked.
For information on MDS mitigations, see: What are the new MDS attacks, and how can they be mitigated?. My answer there only discusses mitigations based on the Intel microcode update (not the very interesting "software sequences").
下图显示了使用数据推测的易受攻击的结构.
The following figure shows the vulnerable structures that use data speculation.