You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@camel.apache.org by Andrew Block <an...@gmail.com> on 2015/09/03 05:48:19 UTC

Manual commit strategies of Kafka Component

Hello all,

I have been working recently with the Kafka component and have run into issues where the component handles committing offsets back to Kafka manually instead of automatically. When the component was created, there were two commit strategies: Automatic and batch. The default functionality of automatically committing offsets works well, however if there is a desire for more fine grained control of the commit actions back to Kafka, the functionality does perform less than desired. 

As mentioned in CAMEL-8975 there is a likelihood that messages will in fact get lost when using the batching strategy. The issue details a number of scenarios where this can occur and I have confirmed the majority of the use cases presented. A new strategy is currently being developed where the commit of offsets can be deferred and handled later in the processing pipeline. Each of these strategies has a fundamental flaw due to the Kafka architecture. As soon as a message is retrieved from Kafka into the Consumer and then subsequently into the processing pipeline, Kafka assumes the message was delivered and any commit of the offsets will include any messages received. This poses a problem especially when failures do occur. For example, lets take the new deferred strategy being developed in a recent pull request. Let envision a route was created that consumed messages from kafka and then executed the commit of offset in a processor later in the route. At time A, a message was consumed by kafka and began down the processing pipeline. Just prior to the the first message entering the processor containing the commit logic, 5 more messages were consumed from Kafka (This would occur if multiple consumer streams were configured). At time B, the commit of the offsets was performed when the first message reached the processor containing the commit offset action. To kafka, the offsets for all 6 messages would be committed. If any exception occurred afterwards, the other 5 messages would be lost. If the route was brought down and restarted, Kafka would begin to read new messages after the 6th message. 

This is just one scenario, and certainly careful exception handling logic could mitigate message loss, it still emphasizes some additional functionality that needs to be added to the Kafka consumer when handling manual commits in Kafka.

If anyone is interested in helping develop solutions to improve the performance of the Camel Kafka component, please reach out. I am very confident the Camel community can work together to develop a solution optimize the Kafka Camel component.

Thanks,
Andy

-- 
Andrew Block