You are viewing a plain text version of this content. The canonical link for it is here.

Posted to jira@kafka.apache.org by "kirktrue (via GitHub)" <gi...@apache.org> on 2023/04/20 23:15:53 UTC

[GitHub] [kafka] kirktrue commented on a diff in pull request #13591: KAFKA-14831: Illegal state errors should be fatal in transactional producer

kirktrue commented on code in PR #13591:
URL: https://github.com/apache/kafka/pull/13591#discussion_r1173167778


##########
clients/src/main/java/org/apache/kafka/clients/producer/internals/TransactionManager.java:
##########
@@ -968,13 +968,31 @@ private void transitionTo(State target) {
     }
 
     private void transitionTo(State target, RuntimeException error) {
+        transitionTo(target, error, false);

Review Comment:
   > I think changing the default behavior to not throw can cause issues in some calls:
   > 
   > 1. TransactionManager.InitProducerIdHandler#handleResponse on line 1303 - lastError is explicitly set to null (which shouldn't be done at all, as transitionTo already does that if the state transition is valid), which will clear the latest error. I think to make this work, that lastError = null should be removed from line 1303
   
   I've removed that line. I am not sure why it's there, but `git blame` says it's been there since 2017 😬 
   
   > 2. This is a call chain where we transition on direct user action, shouldn't this be throwing? KafkaProducer.send -> KafkaProducer.doSend -> maybeTransitionToErrorState -> transitionToAbortableError -> transitionTo
   
   Since `maybeTransitionToErrorState` is being called from inside a `catch` block, if we did throw an exception, it would mask the root issue (`ApiException`), right?
   
   > 3. In TransactionManager.TxnOffsetCommitHandler#handleResponse, there are multiple
   > 
   > ```
   > abortableError(...);
   > break;
   > ```
   > 
   > blocks. If abortableError does not throw on invalid state transition anymore, the txn commit will be retried, even when in a failed state, which doesn't seem correct.
   
   After the `abortableError` call, if the state transition was invalid, then `currentState` would be `FATAL_ERROR`. That has the same effect as the last two branches of the `if` statement that call the `fatalError` method, right?
   
   Would simply skipping the reenqueue on fatal errors be sufficient?
   
   ```
   if (result.isCompleted()) {
       pendingTxnOffsetCommits.clear();
   } else if (pendingTxnOffsetCommits.isEmpty()) {
       result.done();
   } else {
       if (!hasFatalError()) {
           // Retry the commits which failed with a retriable error
           reenqueue();
       }
   }
   ```
   
   I don't yet understand why we'd want to re-enqueue after those calls to `fatalError` anyway 🤔 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org