且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

迭代解析 XML 文件时出现严重的内存泄漏

更新时间:2022-06-27 00:08:51

XML 包的网页来看,作者 Duncan Temple Lang 似乎已经相当广泛地描述了某些内存管理问题.请参阅此页面:XML 包中的内存管理".

From the XML package's webpage, it seems that the author, Duncan Temple Lang, has quite extensively described certain memory management issues. See this page: "Memory Management in the XML Package".

老实说,我不精通您的代码和包的详细情况,但我认为您可以在该页面中找到答案,特别是在名为 "问题",或与 Duncan Temple Lang 直接沟通.

Honestly, I'm not proficient in the details of what's going on here with your code and the package, but I think you'll either find the answer in that page, specifically in the section called "Problems", or in direct communication with Duncan Temple Lang.

更新 1.可能可行的一个想法是使用 multicoreforeach 包(即 listResults = foreach(ix= 1:N) %dopar% {your processing;return(listElement)}.我认为对于 Windows,你需要 doSMP,或者 doRedis; 在 Linux 下,我使用 doMC.无论如何,通过并行化加载,您将获得更快的吞吐量.我认为您可以从内存使用中获得一些好处的原因是它可能是分叉R,可能会导致不同的内存清理,因为每个生成的进程在完成时都会被杀死.这不能保证有效,但它可以解决内存和速度问题.

Update 1. An idea that might work is to use the multicore and foreach packages (i.e. listResults = foreach(ix = 1:N) %dopar% {your processing;return(listElement)}. I think that for Windows you'll need doSMP, or maybe doRedis; under Linux, I use doMC. In any case, by parallelizing the loading, you'll get faster throughput. The reason I think you may get some benefit from memory usage is that it could be that forking R, could lead to different memory cleaning, as each spawned process gets killed when complete. This isn't guaranteed to work, but it could address both memory and speed issues.

请注意:doSMP 有它自己的特性(即你可能仍然有一些内存问题).SO 上的其他问答也提到了一些问题,但我还是想试一试.

Note, though: doSMP has its own idiosyncracies (i.e. you may still have some memory issues with it). There have been other Q&As on SO that mentioned some issues, but I'd still give it a shot.