You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@samza.apache.org by "Cameron Lee (Jira)" <ji...@apache.org> on 2021/09/16 18:13:00 UTC

[jira] [Created] (SAMZA-2692) ClusterBasedJobCoordinator does not shut down cleanly on SIGTERM

Cameron Lee created SAMZA-2692:
----------------------------------

             Summary: ClusterBasedJobCoordinator does not shut down cleanly on SIGTERM
                 Key: SAMZA-2692
                 URL: https://issues.apache.org/jira/browse/SAMZA-2692
             Project: Samza
          Issue Type: Bug
            Reporter: Cameron Lee


There is no shutdown hook that triggers ClusterBasedJobCoordinator to stop, so SIGTERM will not trigger a clean shutdown of ClusterBasedJobCoordinator.

For YARN, it tries to SIGTERM first, but then follows up with a SIGKILL after a timeout ("yarn.nodemanager.sleep-delay-before-sigkill.ms") if the process doesn't exit. Therefore, the job coordinator process will exit, but it is an unclean shutdown. This also causes the shut down to be slower than necessary, since the RM needs to wait for the timeout before sending SIGKILL, instead of the process just exiting normally after the SIGTERM.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)