You are viewing a plain text version of this content. The canonical link for it is here.

Posted to jira@kafka.apache.org by "Antony Stubbs (Jira)" <ji...@apache.org> on 2020/05/18 10:48:00 UTC

[jira] [Comment Edited] (KAFKA-4748) Need a way to shutdown all workers in a Streams application at the same time

    [ https://issues.apache.org/jira/browse/KAFKA-4748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17110133#comment-17110133 ] 

Antony Stubbs edited comment on KAFKA-4748 at 5/18/20, 10:47 AM:
-----------------------------------------------------------------

FYI [~mjsax], my intention for KAFKA-6943 was for a single KS instance, not an entire cluster. This (KAFKA-4748) would be for the entire cluster (all KS instances), from what I understand.

Extending KAFKA-6943 to have an option to shutdown the entire cluster upon a thread crash in a single instance or all threads crashing could be interesting, but would seem an order of magnitude more complex than triggering the shutdown of the instance the thread was living on.


was (Author: astubbs):
FYI [~mjsax], my intention for Kafka-6943 was for a single KS instance, not an entire cluster. This (Kafka-4748) would be for the entire cluster (all KS instances), from what I understand.

Extending Kafka-6943 to have an option to shutdown the entire cluster upon a thread crash in a single instance or all threads crashing could be interesting, but would seem an order of magnitude more complex than triggering the shutdown of the instance the thread was living on.

> Need a way to shutdown all workers in a Streams application at the same time
> ----------------------------------------------------------------------------
>
>                 Key: KAFKA-4748
>                 URL: https://issues.apache.org/jira/browse/KAFKA-4748
>             Project: Kafka
>          Issue Type: Bug
>          Components: streams
>    Affects Versions: 0.10.1.1
>            Reporter: Elias Levy
>            Priority: Major
>
> If you have a fleet of Stream workers for an application and attempt to shut them down simultaneously (e.g. via SIGTERM and Runtime.getRuntime().addShutdownHook() and streams.close())), a large number of the workers fail to shutdown.
> The problem appears to be a race condition between the shutdown signal and the consumer rebalancing that is triggered by some of the workers existing before others.  Apparently, workers that receive the signal later fail to exit apparently as they are caught in the rebalance.
> Terminating workers in a rolling fashion is not advisable in some situations.  The rolling shutdown will result in many unnecessary rebalances and may fail, as the application may have large amount of local state that a smaller number of nodes may not be able to store.
> It would appear that there is a need for a protocol change to allow the coordinator to signal a consumer group to shutdown without leading to rebalancing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)