You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Apurva Mehta (JIRA)" <ji...@apache.org> on 2017/06/09 22:48:18 UTC
[jira] [Comment Edited] (KAFKA-5415) TransactionCoordinator doesn't complete transition to PrepareCommit state

    [ https://issues.apache.org/jira/browse/KAFKA-5415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16045162#comment-16045162 ] 

Apurva Mehta edited comment on KAFKA-5415 at 6/9/17 10:48 PM:
--------------------------------------------------------------

The last successful metadata update was the following. The update timestamp was 1496957141444.

{noformat}
[2017-06-08 21:25:41,449] DEBUG TransactionalId my-first-transactional-id complete transition from Ongoing to TxnTransitMetadata(producerId=2000, producerEpoch=0, txnTimeoutMs=60000, txnState=Ongoing, topicPartitions=Set(output-topic-2, __consumer_offsets-47, output-topic-0, output-topic-1), txnStartTimestamp=1496957141430, txnLastUpdateTimestamp=1496957141444) (kafka.coordinator.transaction.TransactionMetadata)
{noformat}

then the system clock rolled back by a couple of hundred milliseconds, and the 'prepare transition' to 'PrepareCommit' had this transition metadata, with an update time of 1496957141285

{noformat}
[2017-06-08 21:25:41,285] DEBUG TransactionalId my-first-transactional-id prepare transition from Ongoing to TxnTransitMetadata(producerId=2000, producerEpoch=0, txnTimeoutMs=60000, txnState=PrepareCommit, topicPartitions=Set(output-topic-2, __consumer_offsets-47, output-topic-0, output-topic-1), txnStartTimestamp=1496957141430, txnLastUpdateTimestamp=1496957141285) (kafka.coordinator.transaction.TransactionMetadata)
{noformat}

So when it came time to complete the transition, the timestamp check would fail because the new update timestamp was older than the previous one. We wolud throw an illegalStateException, which would be caught and swallowed in the delayed fetch operation, hence leving the transaction hanging with a pendingState of PrepareCommit.




was (Author: apurva):
The last successful metadata update was the following. The update timestamp was 1496957141444.

{noformat}
[2017-06-08 21:25:41,449] DEBUG TransactionalId my-first-transactional-id complete transition from Ongoing to TxnTransitMetadata(producerId=2000, producerEpoch=0, txnTimeoutMs=60000, txnState=Ongoing, topicPartitions=Set(output-topic-2, __consumer_offsets-47, output-topic-0, output-topic-1), txnStartTimestamp=1496957141430, txnLastUpdateTimestamp=1496957141444) (kafka.coordinator.transaction.TransactionMetadata)
{noformat}

then the system clock rolled back by a couple of hundred milliseconds, and the 'prepare transition' to 'PrepareCommit' had this transition metadata 

{noformat}
[2017-06-08 21:25:41,285] DEBUG TransactionalId my-first-transactional-id prepare transition from Ongoing to TxnTransitMetadata(producerId=2000, producerEpoch=0, txnTimeoutMs=60000, txnState=PrepareCommit, topicPartitions=Set(output-topic-2, __consumer_offsets-47, output-topic-0, output-topic-1), txnStartTimestamp=1496957141430, txnLastUpdateTimestamp=1496957141285) (kafka.coordinator.transaction.TransactionMetadata)
{noformat}

So when it came time to complete the transition, the timestamp check would fail because the new update timestamp was older than the previous one. We wolud throw an illegalStateException, which would be caught and swallowed in the delayed fetch operation, hence leving the transaction hanging with a pendingState of PrepareCommit.



> TransactionCoordinator doesn't complete transition to PrepareCommit state
> -------------------------------------------------------------------------
>
>                 Key: KAFKA-5415
>                 URL: https://issues.apache.org/jira/browse/KAFKA-5415
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Apurva Mehta
>            Assignee: Apurva Mehta
>            Priority: Blocker
>              Labels: exactly-once
>             Fix For: 0.11.0.0
>
>         Attachments: 6.tgz
>
>
> This has been revealed by the system test failures on jenkins. 
> The transaction coordinator seems to get into a path during the handling of the EndTxnRequest where it returns an error (possibly a NOT_COORDINATOR or COORDINATOR_NOT_AVAILABLE error, to be revealed by https://github.com/apache/kafka/pull/3278) to the client. However, due to network instability, the producer is disconnected before it receives this error.
> As a result, the transaction remains in a `PrepareXX` state, and future `EndTxn` requests sent by the client after reconnecting result in a `CONCURRENT_TRANSACTION` error code. Hence the client gets stuck and the transaction never finishes, as expiration isn't done from a PrepareXX state.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)