Overview
Parallel Consumer is an open source software to enable fast processing message with less instances
This article is to introduce the internal mechanism to share.
Work Flow Chart
Terminology:
AbstractParallelEoSStreamProcessor
- This is the main entry of parallel consumer
- two main components : controlLoop & BrokerPollSystem
- will select the available work container to be applied on the user function
ControlLoop
- check if the new records, if yes, then will create a new work container
- if it is already processed, updated the offset and commit if needed
BrokerPollSystem
- to poll consumer records from broker
- check if throughput threshold has been reached, if reached, it will pause the consumer
- will add to mailbox queue
WorkManager
- Sharded, prioritised, offset managed, order controlled, delayed work queue
- PartitionStateManager && ShardManager
ShardManager
- Shards are local queues of work to be processed.
- maintain map of shards
Map<ShardKey, ProcessingShard<K, V>> processingShards
PartitionStateManager
- maintain map of PartitionState
Map<TopicPartition, PartitionState<K, V>> partitionStates
- Record the generations of partition assignment, for fencing off invalid work
Map<TopicPartition, Long> partitionsAssignmentEpochs
PartitionState
- incompleteOffsets:
Offsets beyond the highest committable offset (see getOffsetHighestSequentialSucceeded()) which haven't totally succeeded. Based on decoded metadata and polled records (not offset ranges)
- dirty:
if there are offsets to be committed
- partitionsAssignmentEpoch:
from PartitionStateManager
WorkContainer
- consumer record
- epoch :
if stale worker