且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

提高Fortran代码性能的技巧和诀窍

更新时间:2023-11-17 10:38:28

您有关于做什么的先验概念,其中一些可能实际上有帮助,
但最大的收益是在后验分析中。

增加:换句话说,获得 a * b * c 不同的顺序可能会节省几个周期(我怀疑),而同时你不知道你没有盲目的花费1000周期,因为没有很好的理由。)



不管你多么仔细地编码,你都没有预见到加速的机会。这是我如何找到他们。 (有些人认为这种方法有争议的)。

当你这样做时,***从优化标志开始关闭,所以代码不是全部混乱。
稍后,您可以将它们打开并让编译器执行它。



让它在调试器下运行,并具有足够的工作负载,时间长度。
当它运行时,手动中断它,并且仔细研究它在做什么以及为什么。
这样做好几次,比如10次,所以你不会对它花费的时间做出错误的结论。

下面是你可能发现的一些事例:


  • 由于某些表达式的编码方式或者使用相同的参数,可能会花费大量时间不必要地调用数学库函数值与以前的调用相同。

  • 它可能花费大量时间做一些文件I / O,或者打开/关闭文件,深入一些似乎无害调用的例程。

  • 它可能位于一个通用库函数中,用于检查上层函数的参数标志,从而调用下级子例程。在这种情况下,大部分时间可能会通过编写一个特殊用途函数并调用它来消除。



如果你这样做整个操作过程需要两到三次,您将删除在第一次写入时遇到任何软件的蠢事。
之后,您可以打开优化,并行或其他任何方式,并且确信没有时间花费在愚蠢的东西上。


As part of my Ph.D. research, I am working on development of numerical models of atmosphere and ocean circulation. These involve numerically solving systems of PDE's on the order of ~10^6 grid points, over ~10^4 time steps. Thus, a typical model simulation takes hours to a few days to complete when run in MPI on dozens of CPUs. Naturally, improving model efficiency as much as possible is important, while making sure the results are byte-to-byte identical.

While I feel quite comfortable with my Fortran programming, and am aware of quite some tricks to make code more efficient, I feel like there is still space to improve, and tricks that I am not aware of.

Currently, I make sure I use as few divisions as possible, and try not to use literal constants (I was taught to do this from very early on, e.g. use half=0.5 instead of 0.5 in actual computations), use as few transcendental functions as possible etc.

What other performance sensitive factors are there? At the moment, I am wondering about a few:

1) Does the order of mathematical operations matter? For example if I have:

a=1E-7 ; b=2E4 ; c=3E13
d=a*b*c

would d evaluate with different efficiency based on the order of multiplication? Nowadays, this must be compiler specific, but is there a straight answer? I notice d getting (slightly) different value based on the order (precision limit), but will this impact the efficiency or not?

2) Passing lots (e.g. dozens) of arrays as arguments to a subroutine versus accessing these arrays from a module within the subroutine?

3) Fortran 95 constructs (FORALL and WHERE) versus DO and IF? I know that these mattered back in the 90's when code vectorization was a big thing, but is there any difference now with modern compilers being able to vectorize explicit DO loops? (I am using PGI, Intel, and IBM compilers in my work)

4) Raising a number to an integer power versus multiplication? E.g.:

b=a**4

or

b=a*a*a*a

I have been taught to always use the latter where possible. Does this affect efficiency and/or precision? (probably compiler dependent as well)

Please discuss and/or add any tricks and tips that you know about improving Fortran code efficiency. What else is out there? If you know anything specific to what each of the compilers above do related to this question, please include that as well.

Added: Note that I do not have any bottlenecks or performance issues per se. I am asking if there are any general rules for optimizing the code in sense of operations.

Thanks!

You've got a-priori ideas about what to do, and some of them might actually help, but the biggest payoff is in a-posteriori anaylsis.
(Added: In other words, getting a*b*c into a different order might save a couple cycles (which I doubt), while at the same time you don't know you're not getting blind-sided by something spending 1000 cycles for no good reason.)

No matter how carefully you code it, there will be opportunities for speedup that you didn't foresee. Here's how I find them. (Some people consider this method controversial).

It's best to start with optimization flags OFF when you do this, so the code isn't all scrambled. Later you can turn them on and let the compiler do its thing.

Get it running under a debugger with enough of a workload so it runs for a reasonable length of time. While it's running, manually interrupt it, and take a good hard look at what it's doing and why. Do this several times, like 10, so you don't draw erroneous conclusions about what it's spending time at.

Here's examples of things you might find:

  • It could be spending a large fraction of time calling math library functions unnecessarily due to the way some expressions were coded, or with the same argument values as in prior calls.
  • It could be spending a large fraction of time doing some file I/O, or opening/closing a file, deep inside some routine that seemed harmless to call.
  • It could be in a general-purpose library function, calling a subordinate subroutine, for the purpose of checking argument flags to the upper function. In such a case, much of that time might be eliminated by writing a special-purpose function and calling that instead.

If you do this entire operation two or three times, you will have removed the stupid stuff that finds its way into any software when it's first written. After that, you can turn on the optimization, parallelism, or whatever, and be confident no time is being spent on silly stuff.