You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "A. Sophie Blee-Goldman (Jira)" <ji...@apache.org> on 2021/03/23 03:42:00 UTC

[jira] [Created] (KAFKA-12523) Need to improve handling of TimeoutException when committing offsets

A. Sophie Blee-Goldman created KAFKA-12523:
----------------------------------------------

             Summary: Need to improve handling of TimeoutException when committing offsets
                 Key: KAFKA-12523
                 URL: https://issues.apache.org/jira/browse/KAFKA-12523
             Project: Kafka
          Issue Type: Bug
          Components: streams
    Affects Versions: 2.8.0
            Reporter: A. Sophie Blee-Goldman
             Fix For: 2.8.0


Right now, in TaskManager#commitOffsetsOrTransaction if we  catch a TimeoutException then under ALOS we just rethrow it while in EOS we rethrow it as TaskCorruptedException. The problem is that commitOffsetsOrTransaction can be invoked from several places:
# Commit within StreamThread main processing loop (either user requested or commit interval has elapsed: this is presumably the case we had in mind when deciding how to handle the TimeoutException in commitOffsetsOrTransaction , no problem here
# Clean shutdown of application: a bit weird to throw a TaskCorruptedException in this case, but it’ll just end up being caught and forcing a closeDirty, so again no problem here
# From TaskManager#handleRevocation: in this case, it’s possible we hit a TimeoutException on a task that’s actually being revoked. This exception will be saved and rethrown from poll, so under EOS we would catch a TaskCorruptedException and then try to revive this task that we actually no longer own. Pretty sure this will cause an NPE in the TaskManager. Under ALOS, the rethrown TimeoutException will be bubbled up through poll again, but unlike TaskCorruptedException we actually don’t catch TimeoutException anywhere in the StreamThread loop. This will trigger the uncaught exception handler
# From TaskManager#handleTaskCorrupted:  this method is itself invoked from within the catch TaskCorruptedException block of the StreamThread’s runLoop. If we throw TaskCorruptedException again then I believe we won’t even catch this in the safety net catch Throwable block of the runLoop -- it’ll just be thrown directly up through run(). 





--
This message was sent by Atlassian Jira
(v8.3.4#803005)