You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Travis Bischel (Jira)" <ji...@apache.org> on 2022/10/18 18:08:00 UTC

[jira] [Created] (KAFKA-14315) Kraft: 1 broker setup, broker took 34 seconds to transition from PrepareCommit to CompleteCommit

Travis Bischel created KAFKA-14315:
--------------------------------------

             Summary: Kraft: 1 broker setup, broker took 34 seconds to transition from PrepareCommit to CompleteCommit
                 Key: KAFKA-14315
                 URL: https://issues.apache.org/jira/browse/KAFKA-14315
             Project: Kafka
          Issue Type: Bug
          Components: kraft
            Reporter: Travis Bischel


I'm still looking into a PR failure in [my client|https://github.com/twmb/franz-go/pull/223] and noticed something a bit strange. I know that _technically_ I should be using RequireStableFetchOffsets in my transaction tests to prevent rebalances while a transaction is not finalized. I'll be adding that.

However, these tests have never failed against zookeeper mode. The client goes through a lot of efforts to avoid needing KIP-447 behavior, and the assumption with localhost testing is that things run fast enough (and that there are enough guards) that problems would not be encountered.

That looks to not be true with a kraft broker, but looking at __transaction_state, the following looks to be especially problematic:

 

{{__transaction_state partition 33 offset 7 at [2022-10-18 11:15:37.821]}}
{{TxnMetadataKey(0) 9f87dc04dc3f4d5b15ef3072c531cf46327278307df8e149fa966462cd40c10b}}
{{TxnMetadataValue(0)}}
{{      ProducerID           41}}
{{      ProducerEpoch        0}}
{{      TimeoutMillis        120000}}
{{      State                PrepareCommit}}
{{      Topics               __consumer_offsets=>[13] e7c7d971626fbaf4bfb33975e57089167939e6acabb4c4fc534eb148462e45cc=>[4 5 12 16]  }}
{{      LastUpdateTimestamp  1666113337821}}
{{      StartTimestamp       1666113335311}}
{{__transaction_state partition 33 offset 8 at [2022-10-18 11:16:11.419]}}
{{TxnMetadataKey(0) 9f87dc04dc3f4d5b15ef3072c531cf46327278307df8e149fa966462cd40c10b}}
{{TxnMetadataValue(0)}}
{{      ProducerID           41}}
{{      ProducerEpoch        0}}
{{      TimeoutMillis        120000}}
{{      State                CompleteCommit}}
{{      Topics     }}
{{      LastUpdateTimestamp  1666113337821}}
{{      StartTimestamp       1666113335311}}

 

I've captured that using my kcl tool.

Note that the transaction enters PrepareCommit at 11:15:37.821, and then enters CompleteCommit at 11:16:11.419. AFAICT, this means that in my single node kraft setup, the broker took 34 seconds to transition commit states internally.

I noticed this in tests because a rebalance happened between those 34 seconds, which caused duplicate consumption because transactional offset commits were not finalized and the old commits were picked up.

This ticket is related to KAFKA-14312, in that this failure is cropping up as I've worked around KAFKA-14312 within the client itself.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)