且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

为什么std :: mutex在OSX上这么慢?

更新时间:2023-11-05 23:46:22

如果是硬件问题(也许缓存在Macbook上慢得多)你只是测量图书馆的公平交易吞吐量的选择。基准是非常人为的,并且惩罚任何提供任何公平的尝试。

You're just measuring the library's choice of trading off throughput for fairness. The benchmark is heavily artificial and penalizes any attempt to provide any fairness at all.

实现可以做两件事。它可以让同一个线程在一行中获得两次互斥,或者它可以改变哪个线程获得互斥。这个基准严重地惩罚了线程的变化,因为上下文切换需要时间,并且因为从缓存到缓存的乒乓互斥和 val 需要时间。

The implementation can do two things. It can let the same thread get the mutex twice in a row, or it can change which thread gets the mutex. This benchmark heavily penalizes a change in threads because the context switch takes time and because ping-ponging the mutex and val from cache to cache takes time.

很可能,这只是显示了实现必须做出的不同权衡。它大量奖励实现,喜欢将互斥量返回到最后持有它的线程。基准甚至奖励浪费CPU做的那些实现!它甚至奖励浪费CPU的实现,以避免上下文切换,即使有其他有用的工作,CPU可以做!它也不会惩罚可能减缓其他不相关线程的核心间流量的实现。

Most likely, this is just showing the different trade-offs that implementations have to make. It heavily rewards implementations that prefer to give the mutex back to the thread that last held it. The benchmark even rewards implementations that waste CPU to do that! It even rewards implementations that waste CPU to avoid context switches, even when there's other useful work the CPU could do! It also doesn't penalize the implementation for inter-core traffic which can slow down other unrelated threads.

此外,实现互斥体的人通常假定在无争议情况下的性能比在竞争情况下的性能更重要。在这些情况之间有许多折衷,例如假设可能有线程等待或专门检查是否存在。基准测试只(或至少,几乎只有)通常被推翻的情况下推定更常见的情况。

Also, people who implement mutexes generally presume that performance in the uncontended case is more important than performance in the contended case. There are numerous tradeoffs you can make between these cases, such as presuming that there might be a thread waiting or specifically checking if there is. The benchmark tests only (or at least, almost only) the case that is typically traded off in favor of the case presumed more common.

直接,这是一个毫无根据的基准无法识别问题。

Bluntly, this is a senseless benchmark that is incapable of identifying a problem.

几乎可以肯定的是,Linux实现是一个spinlock / futex混合体,而OSX实现是常规的,相当于锁定内核对象。 Linux实现的spinlock部分倾向于允许刚刚释放互斥体的同一个线程再次锁定它,这样你的基准测试大大的奖励。

The specific explanation is almost certainly that the Linux implementation is a spinlock/futex hybrid while the OSX implementation is conventional, equivalent to locking a kernel object. The spinlock portion of the Linux implementation favors allowing the same thread that just released the mutex to lock it again, which your benchmark heavily rewards.