You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Apurva Mehta (JIRA)" <ji...@apache.org> on 2017/06/09 22:40:18 UTC

[jira] [Commented] (KAFKA-5415) TransactionCoordinator doesn't complete transition to PrepareCommit state

    [ https://issues.apache.org/jira/browse/KAFKA-5415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16045151#comment-16045151 ] 

Apurva Mehta commented on KAFKA-5415:
-------------------------------------

The problem here seem to be this check: 

https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/coordinator/transaction/TransactionMetadata.scala#L288

I noticed that the system clock time rolled back slightly during the processing of this transaction, and [~hachikuji] noticed this check.

What this amounts to is that the operation to complete the state transition will fail in the DelayedFetch operation, and the exception would be swallowed. The Pending state would not be cleared, and future EndTxnRequests would fail with a CONCURRENT_TRANSACTIONS exception.

> TransactionCoordinator doesn't complete transition to PrepareCommit state
> -------------------------------------------------------------------------
>
>                 Key: KAFKA-5415
>                 URL: https://issues.apache.org/jira/browse/KAFKA-5415
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Apurva Mehta
>            Assignee: Apurva Mehta
>            Priority: Blocker
>              Labels: exactly-once
>             Fix For: 0.11.0.0
>
>         Attachments: 6.tgz
>
>
> This has been revealed by the system test failures on jenkins. 
> The transaction coordinator seems to get into a path during the handling of the EndTxnRequest where it returns an error (possibly a NOT_COORDINATOR or COORDINATOR_NOT_AVAILABLE error, to be revealed by https://github.com/apache/kafka/pull/3278) to the client. However, due to network instability, the producer is disconnected before it receives this error.
> As a result, the transaction remains in a `PrepareXX` state, and future `EndTxn` requests sent by the client after reconnecting result in a `CONCURRENT_TRANSACTION` error code. Hence the client gets stuck and the transaction never finishes, as expiration isn't done from a PrepareXX state.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)