且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

Kafka只消费一次保证

更新时间:2023-02-25 15:04:56

如果你的意思是恰好一次问题是这样的.您可能知道 Kafka 消费者使用轮询机制,即消费者向服务器询问消息.此外,您需要记住消费者提交消息偏移量,即它告诉集群下一个预期的偏移量是什么.所以,想象一下会发生什么.

If you mean exactly once the problem is like this. Kafka consumer as you may know use a polling mechanism, that is consumers ask the server for messages. Also, you need to recall that the consumer commit message offsets, that is, it tells the cluster what is the next expected offset. So, imagine what could happen.

消费者轮询消息并获取偏移量为 1 的消息.

Consumer poll for messages and get message with offset = 1.

A) 如果消费者在处理消息之前立即提交该偏移量,那么它可能会崩溃并且永远不会再次收到该消息,因为它已经提交,在下一次轮询时,Kafka 将返回偏移量 = 2 的消息.这就是他们所说的最多一次语义.

A) If consumer commit that offset immediately before processing the message, then it can crash and will never receive that message again because it was already committed, on next poll Kafka will return message with offset = 2. This is what they call at most once semantic.

B) 如果消费者先处理消息然后提交偏移量,可能发生的情况是在处理消息之后但在提交之前,消费者崩溃,因此在这种情况下,下一次轮询将再次获得偏移量为 1 的相同消息并且该消息将被处理两次.他们至少有一次这样称呼.

B) If consumer process the message first and then commit the offset, what could happen is that after processing the message but before committing, the consumer crashes, so in that case next poll will get again the same message with offset = 1 and that message will be processed twice. This is what they call at least once.

为了只实现一次,您需要处理消息并在原子操作中提交该偏移量,您总是两者都做或不做.这不是那么容易.执行此操作的一种方法(如果可能)是存储处理结果以及生成该结果的消息的偏移量.然后,当消费者启动时,它会在 Kafka 之外查找最后处理的偏移量并寻找该偏移量.

In order to achieve exactly once, you need to process the message and commit that offset in an atomic operation, where you always do both or none of them. This is not so easy. One way to do this (if possible) is to store the result of the processing along with the offset of the message that generated that result. Then, when consumer starts it looks for the last processed offset outside Kafka and seek to that offset.