You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@camel.apache.org by "Mike Barlotta (Jira)" <ji...@apache.org> on 2023/10/27 19:02:00 UTC

[jira] [Comment Edited] (CAMEL-20044) camel-kafka - On rejoining consumer group Camel can set offset incorrectly causing messages to be replayed

    [ https://issues.apache.org/jira/browse/CAMEL-20044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17780420#comment-17780420 ] 

Mike Barlotta edited comment on CAMEL-20044 at 10/27/23 7:01 PM:
-----------------------------------------------------------------

Downgraded to Camel 3.14.5, a version prior to the fix of CAMEL-18350 
The processing of the payloads looks like this
 * Consumed NORETRY-ERROR 2 times
 * Consumed 1 1 times
 * Consumed 2 1 times
 * ...
 * Consumed 11 1 times

What is interesting here is that the revoke from the consumer group is not logged, nor is there a seek. 

This behavior is different than the way these messages are being processed in 3.21
 * Consumed NORETRY-ERROR 4 times
 * Consumed 1 1 times
 * Consumed 2 1 times
 * ...
 * Consumed 11 1 times

A scan of the various releases and related issues suggests that behavior was changed based on this issue
 * CAMEL-17925

The [3.14 documentation |https://camel.apache.org/components/3.14.x/kafka-component.html]has this for `breakOnFirstError`
_This options controls what happens when a consumer is processing an exchange and it fails. If the option is false then the consumer continues to the next message and processes it. If the option is true then the consumer breaks out, and will seek back to offset of the message that caused a failure, and then re-attempt to process this message. However this can lead to endless processing of the same message if its bound to fail every time, eg a poison message. Therefore its recommended to deal with that for example by using Camel’s error handler._

I could not find older documentation to see how the documentation described the behavior prior to that release.

One other observation, running the test provided with a RETRY error instead on NONRETRY, using 14.5 does result in that payload NOT being retried.

Wondering if _breakOnFirstError_ (when true) should break out and then seek back to the last committed offset (instead of the offset on the {_}lastResult{_}). In the test app provided that would mean that a NORETRY would not be processed again (b/c we committed the offset). However a RETRY would be processed repeatedly (b/c we had not committed the offset). 

Any thoughts?

 


was (Author: g1antfan):
Downgraded to Camel 3.14.5, a version prior to the fix of CAMEL-18350 
The processing of the payloads looks like this
 * Consumed NORETRY-ERROR 2 times
 * Consumed 1 1 times
 * Consumed 2 1 times
 * ...
 * Consumed 11 1 times

What is interesting here is that the revoke from the consumer group is not logged, nor is there a seek. 

This behavior is different than the way these messages are being processed in 3.21
 * Consumed NORETRY-ERROR 4 times
 * Consumed 1 1 times
 * Consumed 2 1 times
 * ...
 * Consumed 11 1 times

A scan of the various releases and related issues suggests that behavior was changed based on this issue
 * CAMEL-17925

The [3.14 documentation |https://camel.apache.org/components/3.14.x/kafka-component.html]has this for `breakOnFirstError`
_This options controls what happens when a consumer is processing an exchange and it fails. If the option is false then the consumer continues to the next message and processes it. If the option is true then the consumer breaks out, and will seek back to offset of the message that caused a failure, and then re-attempt to process this message. However this can lead to endless processing of the same message if its bound to fail every time, eg a poison message. Therefore its recommended to deal with that for example by using Camel’s error handler._

I could not find older documentation to see how the documentation described the behavior prior to that release.

One other observation, running the test provided with a RETRY error instead on NONRETRY, using 14.5 does result in that payload NOT being retried.

Wondering if _breakOnFirstError_ (when true) should break out and then seek back to the last committed offset. In the test app provided that would mean that a NORETRY would not be processed again (b/c we committed the offset). However a RETRY would be processed repeatedly (b/c we had not committed the offset). 

Any thoughts?

 

> camel-kafka - On rejoining consumer group Camel can set offset incorrectly causing messages to be replayed
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: CAMEL-20044
>                 URL: https://issues.apache.org/jira/browse/CAMEL-20044
>             Project: Camel
>          Issue Type: Bug
>          Components: camel-kafka
>    Affects Versions: 3.21.0
>         Environment: * Rocky Linux 8.7
>  * Open JDK 11.0.8
>  * Camel 3.21.0
>  * Spring Boot 2.7.14
>  * Strimzi Kafka 0.28.0/3.0.0
>            Reporter: Mike Barlotta
>            Priority: Major
>
> {*}Reproducing (intermittent){*}:
>  * Configure camel kafka consumer with following:
>  ** autoCommitEnable = false
>  ** allowManualCommit = true
>  ** autoOffsetReset = earliest
>  ** maxPollRecords = 1
>  ** breakOnFirstError = true
>  * Produce a series of records to kafka record to both partitions.
>  * Throw an exception that is unhandled
>  * commit the offset in the onException block
> *Expected behavior:*
>  * Application should consume the record 1 more time, then move on to the next offset in the partition
> *Actual behavior:*
>  * Application will often work. Occasionally will use the offset from another partition and assign that to the partition where the record failed. This can then result in the consumer replaying messages instead of moving forward.
> I put together a sample that can recreate the error. However, given its intermittent nature it may not fail on each run. I have included the logs from 3 different runs on my laptop from this test. Two of them show the error occurring. One of the them has a successful run. I have also provided more details in the README. 
>  * [https://github.com/CodeSmell/CamelKafkaOffset]
> This seems related to other issues with how Camel processes the _breakOnFirstError_ attribute. 
>  * CAMEL-14935
>  * CAMEL-18350
>  * CAMEL-19894



--
This message was sent by Atlassian Jira
(v8.20.10#820010)