You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "Matthias J. Sax (Jira)" <ji...@apache.org> on 2020/02/16 01:23:00 UTC

[jira] [Comment Edited] (KAFKA-9552) Stream should handle OutOfSequence exception thrown from Producer

    [ https://issues.apache.org/jira/browse/KAFKA-9552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17037683#comment-17037683 ] 

Matthias J. Sax edited comment on KAFKA-9552 at 2/16/20 1:22 AM:
-----------------------------------------------------------------

I am not sure if we need to re-balance – if we would have missed a rebalance and lost the task, we would get a `ProducerFencedException`. Hence, on this error we should still be part of the consumer group.

From my understanding an `OutOfOrderSequenceExceptio`n implies data loss, ie, we got an ack back, but on the next send data is not in the log (this could happen if unclean leader election is enabled broker side) – otherwise should indicate a severe bug.

While we could abort the current transaction, and reinitialize the task (ie, refetch the input topic offsets, cleanup the state etc), I am wondering if we should do this as it would mask a bug? Instead, it might be better to not catch and fail fast thus we can report this error?

Btw: In `RecordCollectorImpl` in a recent PR we started to catch `OutOfOrderSequenceException` and rethrow `TaskMigratedException` for this case – however, I am not sure if we should keep this change or roll it back for the same reason.

\cc [~guozhang] [~hachikuji] [~vvcephei]


was (Author: mjsax):
I am not sure if we need to re-balance – if we would have missed a rebalance and lost the task, we would get a `ProducerFencedException`. Hence, on this error we should still be part of the consumer group.

From my understanding an `OutOfOrderSequenceExceptio`n implies data loss, ie, we got an ack back, but on the next send data is not in the log (this could happen if unclean leader election is enabled broker side) – otherwise should indicate a severe bug.

While we could abort the current transaction, and reinitialize the task (ie, refetch the input topic offsets, cleanup the state etc), I am wondering if we should do this as it would mask a bug? Instead, it might be better to not catch and fail fast thus we can report this error?

Btw: In `RecordCollectorImpl` in a recent PR we started to catch `OutOfOrderSequenceException` and rethrow `TaskMigratedException` for this case – however, I am not sure if we should keep this change or roll it back for the same reason.

\cc [~guozhang] [~hachikuji]

> Stream should handle OutOfSequence exception thrown from Producer
> -----------------------------------------------------------------
>
>                 Key: KAFKA-9552
>                 URL: https://issues.apache.org/jira/browse/KAFKA-9552
>             Project: Kafka
>          Issue Type: Improvement
>          Components: streams
>    Affects Versions: 2.5.0
>            Reporter: Boyang Chen
>            Priority: Major
>
> As of today the stream thread could die from OutOfSequence error:
> {code:java}
>  [2020-02-12T07:14:35-08:00] (streams-soak-2-5-eos_soak_i-03f89b1e566ac95cc_streamslog) org.apache.kafka.common.errors.OutOfOrderSequenceException: The broker received an out of order sequence number.
>  [2020-02-12T07:14:35-08:00] (streams-soak-2-5-eos_soak_i-03f89b1e566ac95cc_streamslog) [2020-02-12 15:14:35,185] ERROR [stream-soak-test-546f8754-5991-4d62-8565-dbe98d51638e-StreamThread-1] stream-thread [stream-soak-test-546f8754-5991-4d62-8565-dbe98d51638e-StreamThread-1] Failed to commit stream task 3_2 due to the following error: (org.apache.kafka.streams.processor.internals.AssignedStreamsTasks)
>  [2020-02-12T07:14:35-08:00] (streams-soak-2-5-eos_soak_i-03f89b1e566ac95cc_streamslog) org.apache.kafka.streams.errors.StreamsException: task [3_2] Abort sending since an error caught with a previous record (timestamp 1581484094825) to topic stream-soak-test-KSTREAM-AGGREGATE-STATE-STORE-0000000049-changelog due to org.apache.kafka.common.errors.OutOfOrderSequenceException: The broker received an out of order sequence number.
>  at org.apache.kafka.streams.processor.internals.RecordCollectorImpl.recordSendError(RecordCollectorImpl.java:154)
>  at org.apache.kafka.streams.processor.internals.RecordCollectorImpl.access$500(RecordCollectorImpl.java:52)
>  at org.apache.kafka.streams.processor.internals.RecordCollectorImpl$1.onCompletion(RecordCollectorImpl.java:214)
>  at org.apache.kafka.clients.producer.KafkaProducer$InterceptorCallback.onCompletion(KafkaProducer.java:1353)
> {code}
>  Although this is fatal exception for Producer, stream should treat it as an opportunity to reinitialize by doing a rebalance, instead of killing computation resource.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)