且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

不断增加 YARN 中 Spark 应用程序的物理内存

更新时间:2023-10-13 16:03:04

我终于摆脱了这个问题.问题是在 Spark SQL 的 parquet 写入路径中创建的压缩器没有被回收,因此,我的执行者正在为每个 parquet 写入文件创建一个全新的压缩器(从本机内存),从而耗尽物理内存限制.

Finally I was able to get rid of the issue. The issue was that the compressors created in Spark SQL's parquet write path weren't getting recycled and hence, my executors were creating a brand new compressor (from native memory) for every parquet write file and thus exhausting the physical memory limits.

我在 Parquet Jira 中打开了以下错误并为此提出了 PR :-

I had opened the following bug in Parquet Jira and have raised the PR for same :-

https://issues.apache.org/jira/browse/PARQUET-353

这解决了我最后的内存问题.

This fixed the memory issue at my end.

附言- 您只会在 Parquet 写入密集型应用程序中看到此问题.

P.S. - You will see this problem only in a Parquet write intensive application.