You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "Boyang Chen (Jira)" <ji...@apache.org> on 2020/03/03 18:37:00 UTC

[jira] [Commented] (KAFKA-9638) Do not trigger REBALANCING when specific exceptions occur in Kafka Streams

    [ https://issues.apache.org/jira/browse/KAFKA-9638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17050469#comment-17050469 ] 

Boyang Chen commented on KAFKA-9638:
------------------------------------

Thanks for the ticket. Could you provide a more detailed example where rebalance could cause massive shutdown than necessary? AFAIK, if one thread fails with NPE, it shouldn't affect other threads, or the other threads should already be on the edge of falling down.

> Do not trigger REBALANCING when specific exceptions occur in Kafka Streams 
> ---------------------------------------------------------------------------
>
>                 Key: KAFKA-9638
>                 URL: https://issues.apache.org/jira/browse/KAFKA-9638
>             Project: Kafka
>          Issue Type: New Feature
>          Components: streams
>            Reporter: Levani Kokhreidze
>            Priority: Major
>
> As of now, when StreamThread encounters exception in Kafka Streams application, it will result in REBALANCING of all the tasks that were responsibility of the given thread. Problem with that is, if the exception was, lets say some logical exception, like NPE, REBALANCING is pretty much useless, cause all other threads will also die with the same NPE. This kind of mute rebalancing gives extra costs in terms of network traffic, IOPS, etc in case of large stateful applications.
> In addition, this behaviour causes global outage of the Kafka Streams application, instead of localized outage of the certain tasks. Would be great if Kafka Streams users could specify via some interface, exceptions that must not trigger rebalancing of the tasks. StreamThread may still die, but in this case, we would have isolated incident.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)