更新时间:2023-10-13 16:03:04
我终于摆脱了这个问题.问题是在 Spark SQL 的 parquet 写入路径中创建的压缩器没有被回收,因此,我的执行者正在为每个 parquet 写入文件创建一个全新的压缩器(从本机内存),从而耗尽物理内存限制.
Finally I was able to get rid of the issue. The issue was that the compressors created in Spark SQL's parquet write path weren't getting recycled and hence, my executors were creating a brand new compressor (from native memory) for every parquet write file and thus exhausting the physical memory limits.
我在 Parquet Jira 中打开了以下错误并为此提出了 PR :-
I had opened the following bug in Parquet Jira and have raised the PR for same :-
https://issues.apache.org/jira/browse/PARQUET-353
这解决了我最后的内存问题.
This fixed the memory issue at my end.
附言- 您只会在 Parquet 写入密集型应用程序中看到此问题.
P.S. - You will see this problem only in a Parquet write intensive application.