且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

Fortran 90 中的堆栈溢出

更新时间:2023-09-06 22:54:22

根据 steabert 的要求,我将在此处的评论中总结对话,让对话更加明显,尽管 MSB 的答案已经正确问题的关键.

As per steabert's request, I'll just summarize the conversation in the comments here where it's a bit more visible, even though M.S.B.'s answer already gets right to the nub of the problem.

在技术编程中,过程通常具有用于中间计算的大型本地数组,这种情况经常发生.局部变量通常存储在堆栈中,通常(并且相当合理地)占整个系统内存的一小部分——通常为 10MB 左右.当局部变量大小超过堆栈大小时,您会看到此处描述的确切症状 - 在调用相关子例程之后但在其第一个可执行语句之前发生堆栈溢出.

In technical programming, where procedures often have large local arrays for intermediate computation, this happens a lot. Local variables are generally stored on the stack, which typically (and quite reasonably) a small fraction of overall system memory -- usually of order 10MB or so. When the local variable sizes exceed the stack size, you see exactly the symptoms described here -- a stack overflow occuring after a call to the relevant subroutine but before its first executable statement.

所以当这个问题发生时,***的办法就是找到相关的大局部变量,然后决定怎么做.在这种情况下,至少变量 belm 和 dstrain 变得相当大.

So when this problem happens, the best thing to do is to find the relevant large local variables, and decide what to do. In this case, at least the variables belm and dstrain were getting quite sizable.

一旦找到变量,并且您已经确认这是问题所在,就有了一些选择.正如 MSB 指出的那样,如果您可以使阵列更小,那是一种选择.或者,您可以使堆栈大小更大;在 linux 下,使用 ulimit -s [newsize] 完成.不过,这实际上只是推迟了问题,您必须在 Windows 机器上做一些不同的事情.

Once the variables are located, and you've confirmed that's the problem, there's a few options. As MSB points out, if you can make your arrays smaller, that's one option. Alternatively, you can make the stack size larger; under linux, that's done with ulimit -s [newsize]. That really just postpones the problem, though, and you have to do something different on windows machines.

避免这个问题的另一类方法不是将大数据放在堆栈上,而是放在内存的其余部分(堆")中.您可以通过为数组赋予 save 属性(在 C 中,static)来做到这一点;这会将变量放在堆上,从而使值在调用之间保持不变.不利的一面是,这可能会改变子例程的行为,并且意味着子例程不能递归使用,并且同样是非线程安全的(如果您曾经处于多个线程将同时进入例程的位置,它们'将各自看到本地变量的相同副本并可能覆盖彼此的结果).好处是它很容易而且非常便携——它应该可以在任何地方工作.但是,这只适用于固定大小的局部变量;如果临时数组的大小取决于输入,则不能这样做(因为不再需要保存单个变量;每次调用过程时它的大小都可能不同).

The other class of ways to avoid this problem is not to put the large data on the stack, but in the rest of memory (the "heap"). You can do that by giving the arrays the save attribute (in C, static); this puts the variable on the heap and thus makes the values persistent between calls. The downside there is that this potentially changes the behavior of the subroutine, and means the subroutine can't be used recursively, and similarly is non-threadsafe (if you're ever in a position where multiple threads will enter the routine simulatneously, they'll each see the same copy of the local varaiable and potentially overwrite each other's results). The upside is that it's easy and very portable -- it should work everywhere. However, this will only work with fixed-size local variables; if the temporary arrays have sizes that depend on the inputs, you can't do this (since there'd no longer be a single variable to save; it could be different size every time the procedure is called).

有一些特定于编译器的选项可以将所有数组(或所有大于某个给定大小的数组)放在堆上而不是堆栈上;我知道的每个 Fortran 编译器都有一个选项.对于在 OP 帖子中使用的 ifort,它在 linux 中是 -heap-arrays,在 Windows 中是 /heap-arrays.对于 gfortran,这实际上可能是默认设置.这有助于确保您知道发生了什么,但这意味着您必须为每个编译器设置不同的咒语以确保您的代码正常工作.

There are compiler-specific options which put all arrays (or all arrays of larger than some given size) on the heap rather than on the stack; every Fortran compiler I know has an option for this. For ifort, used in the OPs post, it's -heap-arrays in linux, or /heap-arrays for windows. For gfortran, this may actually be the default. This is good for making sure you know what's going on, but it means you have to have different incantations for every compiler to make sure your code works.

最后,您可以使有问题的数组可分配.分配的内存在堆上;但是指向它们的变量在堆栈上,因此您可以获得这两种方法的好处.此外,这是完全标准的 fortran,因此完全可移植.缺点是它需要更改代码.此外,分配过程可能会花费大量时间.因此,如果您要多次调用例程,您可能会注意到这会稍微减慢速度.(不过,这种可能的性能回归很容易修复;如果您要使用相同大小的数组调用它无数次,您可以有一个可选参数来传入预分配的本地数组并使用它,这样你只分配/释放一次).

Finally, you can make the offending arrays allocatable. Allocated memory goes on the heap; but the variable which points to them is on the stack, so you get the benefits of both approaches. Also, this is completely standard fortran and so totally portable. The downside is that it requires code changes. Also, the allocation process can take nontrivial amounts of time; so if you're going to be calling the routine zillions of times, you may notice this slows things down slightly. (This possible performance regression is easy to fix, though; if you'll be calling it zillions of times with the same size arrays, you can have an optional argument to pass in a pre-allocated local array and use that instead, so that you only allocate/deallocate once).

每次分配/解除分配如下所示:

Allocating/deallocating each time would look like:

SUBROUTINE UpdateContinuumState(iTask,iArray,posc,dof,dof_k,nodedof,elm,bmtrx,&
                    detjac,w,mtrlprops,demtrx,dt,stress,strain,effstrain,&
                    effstress,aa,fi,errmsg)

    IMPLICIT NONE

    !...arguments.... 


    !Locals
    !...
    REAL(8),DIMENSION(:,:), allocatable :: belm
    REAL(8),DIMENSION(:), allocatable :: dstrain

    allocate(belm(iArray(12)*iArray(17),iArray(15))  
    allocate(dstrain(iArray(12)*iArray(17)*iArray(5))

    !... work

    deallocate(belm)
    deallocate(dstrain)

请注意,如果子例程执行大量工作(例如,执行需要几秒钟),则几次分配/解除分配的开销应该可以忽略不计.如果不是,并且您想避免开销,则使用预分配工作空间的可选参数看起来像:

Note that if the subroutine does a lot of work (eg, takes seconds to execute), the overhead from a couple allocate/deallocates should be negligable. If not, and you want to avoid the overhead, using the optional arguments for preallocated worskpace would look something like:

SUBROUTINE UpdateContinuumState(iTask,iArray,posc,dof,dof_k,nodedof,elm,bmtrx,&
                    detjac,w,mtrlprops,demtrx,dt,stress,strain,effstrain,&
                    effstress,aa,fi,errmsg,workbelm,workdstrain)

    IMPLICIT NONE

    !...arguments.... 
    real(8),dimension(:,:), optional, target :: workbelm
    real(8),dimension(:), optional, target :: workdstrain
    !Locals
    !...

    REAL(8),DIMENSION(:,:), pointer :: belm
    REAL(8),DIMENSION(:), pointer :: dstrain

    if (present(workbelm)) then
       belm => workbelm
    else
       allocate(belm(iArray(12)*iArray(17),iArray(15))
    endif
    if (present(workdstrain)) then
       dstrain => workdstrain
    else
       allocate(dstrain(iArray(12)*iArray(17)*iArray(5))
    endif

    !... work

    if (.not.(present(workbelm))) deallocate(belm)
    if (.not.(present(workdstrain))) deallocate(dstrain)