You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@kafka.apache.org by "Sagar Rao (Jira)" <ji...@apache.org> on 2022/07/03 15:18:00 UTC

[jira] [Created] (KAFKA-14040) Improve test coverage for max buffer bytes metrics

Sagar Rao created KAFKA-14040:
---------------------------------

             Summary: Improve test coverage for max buffer bytes metrics
                 Key: KAFKA-14040
                 URL: https://issues.apache.org/jira/browse/KAFKA-14040
             Project: Kafka
          Issue Type: Bug
          Components: streams
            Reporter: Sagar Rao
            Assignee: Sagar Rao


In some EOS applications with relatively long restoration times we've noticed a series of ProducerFencedExceptions occurring during/immediately after restoration. The broker logs were able to confirm these were due to transactions timing out.

In Streams, it turns out we automatically begin a new txn when calling {{send}} (if there isn’t already one in flight). A {{send}} occurs often outside a commit during active processing (eg writing to the changelog), leaving the txn open until the next commit. And if a StreamThread has been actively processing when a rebalance results in a new stateful task without revoking any existing tasks, the thread won’t actually commit this open txn before it goes back into the restoration phase while it builds up state for the new task. So the in-flight transaction is left open during restoration, during which the StreamThread only consumes from the changelog without committing, leaving it vulnerable to timing out when restoration times exceed the configured transaction.timeout.ms for the producer client.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)