且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

如何将火花流 DF 写入 Kafka 主题

更新时间:2023-10-15 11:18:28

我的第一个建议是尝试在 foreachPartition 中创建一个新实例并测量它是否足够快满足您的需要(在 foreachPartition 中实例化重对象是官方文档建议).

My first advice would be to try to create a new instance in foreachPartition and measure if that is fast enough for your needs (instantiating heavy objects in foreachPartition is what the official documentation suggests).

另一种选择是使用对象池,如下例所示:

Another option is to use an object pool as illustrated in this example:

https:///github.com/miguno/kafka-storm-starter/blob/develop/src/main/scala/com/miguno/kafkastorm/kafka/PooledKafkaProducerAppFactory.scala

然而,我发现在使用检查点时很难实现.

I however found it hard to implement when using checkpointing.

另一个对我来说运行良好的版本是一个工厂,如以下博客文章中所述,您只需检查它是否提供了足够的并行性以满足您的需求(查看评论部分):

Another version that is working well for me is a factory as described in the following blog post, you just have to check if it provides enough parallelism for your needs (check the comments section):

http://allegro.tech/2015/08/spark-kafka-integration.html一个>