Overview
When application using parallel consumer experiencing frequent rebalancing the restart, there is possibility that very few messages will be lost
This article is to explain the root cause of this kind of message loss.
Work Flow Chart
Explanation:
- In the current work flow, the commit offset per partition is control by offsetHighestSeen, while the increment of this var is changed as soon as the record is polled from kafka topic, even before processing it.
- While the committing offset, it is periodical. Therefore it is potential that the offset is committed even before the message is actually processed. In normal case they are fine. But as the flow chart showed above, when the parallel consumer is beginning to close, and when the message is polled but yet to reach mailbox queue and working threadpool, even when in closing phase, para-consumer will try to drain the mailbox queue and waiting for the task to finish in threadpool. The message cannot be processed.
- the fix is to use offsetHighestSucceeded instead of offsetHighestSeen to make sure the offset will be increased only when it is processed successfully.