You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "Vince Mu (Jira)" <ji...@apache.org> on 2020/06/01 14:01:00 UTC

[jira] [Commented] (KAFKA-6520) When a Kafka Stream can't communicate with the server, it's Status stays RUNNING

    [ https://issues.apache.org/jira/browse/KAFKA-6520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17121035#comment-17121035 ] 

Vince Mu commented on KAFKA-6520:
---------------------------------

My approach for this would be to create a new metric, fetch-disconnect-rate, which would represent the number of disconnects over a short window of time. Disconnects could be recorded in the onFailure handler for the fetches being sent. In the Kafka streams container or in the streamThread we could read this metric and set the state to DISCONNECTED if the rate exceeds a certain threshold. 
However, I question the need for an entirely new state if it only serves to inform the user of connectivity. Would simply exposing a new metric be enough to achieve this?

Appreciate any thoughts on this matter. 

> When a Kafka Stream can't communicate with the server, it's Status stays RUNNING
> --------------------------------------------------------------------------------
>
>                 Key: KAFKA-6520
>                 URL: https://issues.apache.org/jira/browse/KAFKA-6520
>             Project: Kafka
>          Issue Type: Improvement
>          Components: streams
>            Reporter: Michael Kohout
>            Priority: Major
>              Labels: newbie, user-experience
>
> KIP WIP: [https://cwiki.apache.org/confluence/display/KAFKA/KIP-457%3A+Add+DISCONNECTED+status+to+Kafka+Streams]
> When you execute the following scenario the application is always in RUNNING state
>   
>  1)start kafka
>  2)start app, app connects to kafka and starts processing
>  3)kill kafka(stop docker container)
>  4)the application doesn't give any indication that it's no longer connected(Stream State is still RUNNING, and the uncaught exception handler isn't invoked)
>   
>   
>  It would be useful if the Stream State had a DISCONNECTED status.
>   
>  See [this|https://groups.google.com/forum/#!topic/confluent-platform/nQh2ohgdrIQ] for a discussion from the google user forum.  This is a link to a related issue.
> -------------------------
> Update: there are some discussions on the PR itself which leads me to think that a more general solution should be at the ClusterConnectionStates rather than at the Streams or even Consumer level. One proposal would be:
>  * Add a new metric named `failedConnection` in SelectorMetrics which is recorded at `connect()` and `pollSelectionKeys()` functions, upon capture the IOException / RuntimeException which indicates the connection disconnected.
>  * And then users of Consumer / Streams can monitor on this metric, which normally will only have close to zero values as we have transient disconnects, if it is spiking it means the brokers are consistently being unavailable indicting the state.
> [~Yohan123] WDYT?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)