且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

迭代解析XML文件时出现严重的内存泄漏

更新时间:2022-06-27 00:09:03

XML软件包的网页上,作者Duncan Temple Lang似乎已经相当广泛地描述了某些内存管理问题.参见本页:"XML包中的内存管理" .

From the XML package's webpage, it seems that the author, Duncan Temple Lang, has quite extensively described certain memory management issues. See this page: "Memory Management in the XML Package".

老实说,我对代码和程序包的处理细节不熟练,但是我想您可以在该页面中找到答案,特别是在,或与Duncan Temple Lang直接通信.

Honestly, I'm not proficient in the details of what's going on here with your code and the package, but I think you'll either find the answer in that page, specifically in the section called "Problems", or in direct communication with Duncan Temple Lang.

更新1. 可能可行的想法是使用multicoreforeach软件包(即listResults = foreach(ix = 1:N) %dopar% {your processing;return(listElement)}.我认为对于Windows,您将需要doSMP,或者也许doRedis;在Linux下,我使用doMC.在任何情况下,通过并行化加载,您将获得更快的吞吐量.我认为您可以从内存使用中获得一些好处的原因是,它可能是分叉R ,可能会导致不同的内存清理,因为每个生成的进程在完成时都会被杀死.虽然不能保证正常工作,但是可以解决内存和速度问题.

Update 1. An idea that might work is to use the multicore and foreach packages (i.e. listResults = foreach(ix = 1:N) %dopar% {your processing;return(listElement)}. I think that for Windows you'll need doSMP, or maybe doRedis; under Linux, I use doMC. In any case, by parallelizing the loading, you'll get faster throughput. The reason I think you may get some benefit from memory usage is that it could be that forking R, could lead to different memory cleaning, as each spawned process gets killed when complete. This isn't guaranteed to work, but it could address both memory and speed issues.

但是请注意:doSMP具有其自身的特质(即,您可能仍会遇到一些内存问题).关于SO的其他问答中也提到了一些问题,但我还是会尝试一下.

Note, though: doSMP has its own idiosyncracies (i.e. you may still have some memory issues with it). There have been other Q&As on SO that mentioned some issues, but I'd still give it a shot.