且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

如何描述在Linux上运行的C ++代码?

更新时间:2023-09-30 15:42:46

如果您的目标是使用探查器,请使用建议的探查器之一.

但是,如果您急于在主观上很慢的情况下在调试器中手动中断程序,那么有一种简单的方法可以查找性能问题.

仅将其暂停几次,然后每次查看调用堆栈.如果有一些代码浪费了一定比例的时间(20%或50%或其他),那么这就是您在每次采样时都将其捕获的概率.因此,这大约是您将看到样品的百分比.不需要有根据的猜测. 如果您确实猜到问题出在哪里,这将证明或否定该问题.

您可能会遇到多个不同大小的性能问题.如果您清除其中任何一个,其余的将在以后的传递中占更大的比例,并且更容易发现. 这种放大效应在解决多个问题时会导致真正巨大的加速因素.

注意事项:除非程序员自己使用过,否则程序员往往会对这种技术持怀疑态度.他们会说探查器会为您提供此信息,但是只有当他们对整个调用堆栈进行采样,然后让您检查随机的一组采样时,这才是正确的. (摘要是失去洞察力的地方.)调用图不会为您提供相同的信息,因为

  1. 他们不在教学水平上进行总结,并且
  2. 在递归存在的情况下,它们给出了令人困惑的摘要.

他们还会说,它实际上仅对玩具程序有效,而实际上对任何程序都有效,并且似乎在较大的程序上效果更好,因为它们往往会发现更多问题. 他们会说有时发现没有问题的东西,但这只有在您看到一次的情况下才是正确的.如果您在多个样本上发现问题,那就是真实的.

P.S.如果有一种方法可以像Java中那样在某个时间点收集线程池的调用堆栈样本,那么也可以在多线程程序上完成此操作.

P.P.S大致来说,您的软件中的抽象层越多,您越有可能发现这是性能问题的原因(以及加速的机会).

添加:可能并不明显,但是在有递归的情况下,堆栈采样技术同样有效.原因是删除一条指令可以节省的时间大约等于包含一条指令的样本所占的比例,而与该指令在一个样本中可能发生的次数无关.

我经常听到的另一个反对意见是:"它将在某个地方随机停止,并且会错过真正的问题". 这源于对实际问题有一个先验的概念. 性能问题的一个关键特性是它们无法兑现预期. 抽样告诉您某些问题,而您的第一个反应是难以置信. 这是很自然的,但是您可以确定它是否发现了真正的问题,反之亦然.

添加:让我对它的工作原理进行贝叶斯解释.假设有一条指令I(调用或其他方式)在调用堆栈上占时间的一部分f(因此花费了很多).为了简单起见,假设我们不知道f是什么,但是假设它是0.1、0.2、0.3,... 0.9、1.0,并且每种可能性的先验概率为0.1,因此所有这些成本同样很可能是先天性的.

然后假设我们只取了2个堆栈样本,并且在两个样本上都看到了指令I,称为观察值o=2/2.据此,我们为I的频率f提供了新的估算值:

Prior                                    
P(f=x) x  P(o=2/2|f=x) P(o=2/2&&f=x)  P(o=2/2&&f >= x)  P(f >= x | o=2/2)

0.1    1     1             0.1          0.1            0.25974026
0.1    0.9   0.81          0.081        0.181          0.47012987
0.1    0.8   0.64          0.064        0.245          0.636363636
0.1    0.7   0.49          0.049        0.294          0.763636364
0.1    0.6   0.36          0.036        0.33           0.857142857
0.1    0.5   0.25          0.025        0.355          0.922077922
0.1    0.4   0.16          0.016        0.371          0.963636364
0.1    0.3   0.09          0.009        0.38           0.987012987
0.1    0.2   0.04          0.004        0.384          0.997402597
0.1    0.1   0.01          0.001        0.385          1

                  P(o=2/2) 0.385                

最后一栏说,例如,f> = 0.5的概率为92%,高于先前假设的60%.

假设先前的假设是不同的.假设我们假设P(f = 0.1)为.991(几乎可以肯定),而其他所有可能性几乎都是不可能的(0.001).换句话说,我们的先验是I很便宜.然后我们得到:

Prior                                    
P(f=x) x  P(o=2/2|f=x) P(o=2/2&& f=x)  P(o=2/2&&f >= x)  P(f >= x | o=2/2)

0.001  1    1              0.001        0.001          0.072727273
0.001  0.9  0.81           0.00081      0.00181        0.131636364
0.001  0.8  0.64           0.00064      0.00245        0.178181818
0.001  0.7  0.49           0.00049      0.00294        0.213818182
0.001  0.6  0.36           0.00036      0.0033         0.24
0.001  0.5  0.25           0.00025      0.00355        0.258181818
0.001  0.4  0.16           0.00016      0.00371        0.269818182
0.001  0.3  0.09           0.00009      0.0038         0.276363636
0.001  0.2  0.04           0.00004      0.00384        0.279272727
0.991  0.1  0.01           0.00991      0.01375        1

                  P(o=2/2) 0.01375                

现在它说P(f> = 0.5)为26%,高于先前的0.6%假设.因此,贝叶斯允许我们更新I的可能成本的估计.如果数据量很小,它并不能准确地告诉我们成本是多少,而只是告诉我们它足够大,值得修复.

另一种查看方式称为继承规则. 如果您掷硬币两次,并且两次都出现正面,那么这可能告诉您硬币的权重是多少? 尊重的回答方式是说它是Beta分布,平均值(命中数+ 1)/(尝试数+ 2)=(2 + 1)/(2 + 2)= 75%.>

(关键是我们多次看到I.如果只看到一次,那么除了f> 0之外,其他信息就不多了.)

因此,即使是非常少量的示例,也可以告诉我们很多有关所看到的指令成本的信息. (并且平均会看到它们与成本成正比的频率.如果采集了n个样本,而f是成本,则I将出现在nf+/-sqrt(nf(1-f))个样本上.例如,f=0.3,即3+/-1.4个样本.)


添加后,可以直观地了解测量和随机堆栈采样之间的区别:
现在有剖析器可以对堆栈进行采样,即使是在墙上的时钟时间,但结果是度量值(或热路径或热点,瓶颈"可以从中轻松隐藏).他们没有向您显示(并且很容易做到)是实际的样本本身.而且,如果您的目标是找到瓶颈,则平均需要查看的数量是 2除以所花费的时间. 因此,如果花费30%的时间,平均将显示2/.3 = 6.7个样本,而20个样本将显示99.2%.

这里是检验测量值和检验烟囱样品之间差异的现成插图. 瓶颈可能是这样的一个大斑点,也可能是很多小的斑点,这没什么区别.

测量是水平的;它告诉您特定例程花费的时间比例. 采样是垂直的. 如果有什么办法可以避免整个程序在那一刻的工作,并且在第二个示例中看到它,那么您已经找到了瓶颈. 这就是与众不同的原因-查看花费时间的全部原因,而不仅仅是花多少时间.

I have a C++ application, running on Linux, which I'm in the process of optimizing. How can I pinpoint which areas of my code are running slowly?

If your goal is to use a profiler, use one of the suggested ones.

However, if you're in a hurry and you can manually interrupt your program under the debugger while it's being subjectively slow, there's a simple way to find performance problems.

Just halt it several times, and each time look at the call stack. If there is some code that is wasting some percentage of the time, 20% or 50% or whatever, that is the probability that you will catch it in the act on each sample. So that is roughly the percentage of samples on which you will see it. There is no educated guesswork required. If you do have a guess as to what the problem is, this will prove or disprove it.

You may have multiple performance problems of different sizes. If you clean out any one of them, the remaining ones will take a larger percentage, and be easier to spot, on subsequent passes. This magnification effect, when compounded over multiple problems, can lead to truly massive speedup factors.

Caveat: Programmers tend to be skeptical of this technique unless they've used it themselves. They will say that profilers give you this information, but that is only true if they sample the entire call stack, and then let you examine a random set of samples. (The summaries are where the insight is lost.) Call graphs don't give you the same information, because

  1. they don't summarize at the instruction level, and
  2. they give confusing summaries in the presence of recursion.

They will also say it only works on toy programs, when actually it works on any program, and it seems to work better on bigger programs, because they tend to have more problems to find. They will say it sometimes finds things that aren't problems, but that is only true if you see something once. If you see a problem on more than one sample, it is real.

P.S. This can also be done on multi-thread programs if there is a way to collect call-stack samples of the thread pool at a point in time, as there is in Java.

P.P.S As a rough generality, the more layers of abstraction you have in your software, the more likely you are to find that that is the cause of performance problems (and the opportunity to get speedup).

Added: It might not be obvious, but the stack sampling technique works equally well in the presence of recursion. The reason is that the time that would be saved by removal of an instruction is approximated by the fraction of samples containing it, regardless of the number of times it may occur within a sample.

Another objection I often hear is: "It will stop someplace random, and it will miss the real problem". This comes from having a prior concept of what the real problem is. A key property of performance problems is that they defy expectations. Sampling tells you something is a problem, and your first reaction is disbelief. That is natural, but you can be sure if it finds a problem it is real, and vice-versa.

ADDED: Let me make a Bayesian explanation of how it works. Suppose there is some instruction I (call or otherwise) which is on the call stack some fraction f of the time (and thus costs that much). For simplicity, suppose we don't know what f is, but assume it is either 0.1, 0.2, 0.3, ... 0.9, 1.0, and the prior probability of each of these possibilities is 0.1, so all of these costs are equally likely a-priori.

Then suppose we take just 2 stack samples, and we see instruction I on both samples, designated observation o=2/2. This gives us new estimates of the frequency f of I, according to this:

Prior                                    
P(f=x) x  P(o=2/2|f=x) P(o=2/2&&f=x)  P(o=2/2&&f >= x)  P(f >= x | o=2/2)

0.1    1     1             0.1          0.1            0.25974026
0.1    0.9   0.81          0.081        0.181          0.47012987
0.1    0.8   0.64          0.064        0.245          0.636363636
0.1    0.7   0.49          0.049        0.294          0.763636364
0.1    0.6   0.36          0.036        0.33           0.857142857
0.1    0.5   0.25          0.025        0.355          0.922077922
0.1    0.4   0.16          0.016        0.371          0.963636364
0.1    0.3   0.09          0.009        0.38           0.987012987
0.1    0.2   0.04          0.004        0.384          0.997402597
0.1    0.1   0.01          0.001        0.385          1

                  P(o=2/2) 0.385                

The last column says that, for example, the probability that f >= 0.5 is 92%, up from the prior assumption of 60%.

Suppose the prior assumptions are different. Suppose we assume P(f=0.1) is .991 (nearly certain), and all the other possibilities are almost impossible (0.001). In other words, our prior certainty is that I is cheap. Then we get:

Prior                                    
P(f=x) x  P(o=2/2|f=x) P(o=2/2&& f=x)  P(o=2/2&&f >= x)  P(f >= x | o=2/2)

0.001  1    1              0.001        0.001          0.072727273
0.001  0.9  0.81           0.00081      0.00181        0.131636364
0.001  0.8  0.64           0.00064      0.00245        0.178181818
0.001  0.7  0.49           0.00049      0.00294        0.213818182
0.001  0.6  0.36           0.00036      0.0033         0.24
0.001  0.5  0.25           0.00025      0.00355        0.258181818
0.001  0.4  0.16           0.00016      0.00371        0.269818182
0.001  0.3  0.09           0.00009      0.0038         0.276363636
0.001  0.2  0.04           0.00004      0.00384        0.279272727
0.991  0.1  0.01           0.00991      0.01375        1

                  P(o=2/2) 0.01375                

Now it says P(f >= 0.5) is 26%, up from the prior assumption of 0.6%. So Bayes allows us to update our estimate of the probable cost of I. If the amount of data is small, it doesn't tell us accurately what the cost is, only that it is big enough to be worth fixing.

Yet another way to look at it is called the Rule Of Succession. If you flip a coin 2 times, and it comes up heads both times, what does that tell you about the probable weighting of the coin? The respected way to answer is to say that it's a Beta distribution, with average value (number of hits + 1) / (number of tries + 2) = (2+1)/(2+2) = 75%.

(The key is that we see I more than once. If we only see it once, that doesn't tell us much except that f > 0.)

So, even a very small number of samples can tell us a lot about the cost of instructions that it sees. (And it will see them with a frequency, on average, proportional to their cost. If n samples are taken, and f is the cost, then I will appear on nf+/-sqrt(nf(1-f)) samples. Example, n=10, f=0.3, that is 3+/-1.4 samples.)


ADDED, to give an intuitive feel for the difference between measuring and random stack sampling:
There are profilers now that sample the stack, even on wall-clock time, but what comes out is measurements (or hot path, or hot spot, from which a "bottleneck" can easily hide). What they don't show you (and they easily could) is the actual samples themselves. And if your goal is to find the bottleneck, the number of them you need to see is, on average, 2 divided by the fraction of time it takes. So if it takes 30% of time, 2/.3 = 6.7 samples, on average, will show it, and the chance that 20 samples will show it is 99.2%.

Here is an off-the-cuff illustration of the difference between examining measurements and examining stack samples. The bottleneck could be one big blob like this, or numerous small ones, it makes no difference.

Measurement is horizontal; it tells you what fraction of time specific routines take. Sampling is vertical. If there is any way to avoid what the whole program is doing at that moment, and if you see it on a second sample, you've found the bottleneck. That's what makes the difference - seeing the whole reason for the time being spent, not just how much.