更新时间:2023-10-15 11:18:28
我的第一个建议是尝试在 foreachPartition 中创建一个新实例并测量它是否足够快满足您的需要(在 foreachPartition 中实例化重对象是官方文档建议).
My first advice would be to try to create a new instance in foreachPartition and measure if that is fast enough for your needs (instantiating heavy objects in foreachPartition is what the official documentation suggests).
另一种选择是使用对象池,如下例所示:
Another option is to use an object pool as illustrated in this example:
然而,我发现在使用检查点时很难实现.
I however found it hard to implement when using checkpointing.
另一个对我来说运行良好的版本是一个工厂,如以下博客文章中所述,您只需检查它是否提供了足够的并行性以满足您的需求(查看评论部分):
Another version that is working well for me is a factory as described in the following blog post, you just have to check if it provides enough parallelism for your needs (check the comments section):