You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Patrick J. McNerthney (JIRA)" <ji...@apache.org> on 2015/09/28 02:12:04 UTC

[jira] [Commented] (KAFKA-1804) Kafka network thread lacks top exception handler

    [ https://issues.apache.org/jira/browse/KAFKA-1804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14909940#comment-14909940 ] 

Patrick J. McNerthney commented on KAFKA-1804:
----------------------------------------------

I ran into a similar issue where that same "java.util.NoSuchElementException: None.get" exception was being thrown in the ConnectionQuotas.dec method. I was able to reproduce it, and I believe I have found the root cause of all cases of these.

The call to "close(key)" on this line https://github.com/apache/kafka/blob/0.8.2/core/src/main/scala/kafka/network/SocketServer.scala#L406 is the culprit. This call should be be done there because, as the debug log on the line just above says, the socket is already closed. In other words, a "close(key)" using that key has already occurred. This causes an extra call on ConnectionQuotas.dec against that InetAddress. This sets up the situation where later on during the closing of an actually open key that there is now a None value in ConnectionQuotas count for that InetAddress.

I have a log files if needed.

> Kafka network thread lacks top exception handler
> ------------------------------------------------
>
>                 Key: KAFKA-1804
>                 URL: https://issues.apache.org/jira/browse/KAFKA-1804
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.8.2.0
>            Reporter: Oleg Golovin
>            Priority: Critical
>
> We have faced the problem that some kafka network threads may fail, so that jstack attached to Kafka process showed fewer threads than we had defined in our Kafka configuration. This leads to API requests processed by this thread getting stuck unresponed.
> There were no error messages in the log regarding thread failure.
> We have examined Kafka code to find out there is no top try-catch block in the network thread code, which could at least log possible errors.
> Could you add top-level try-catch block for the network thread, which should recover network thread in case of exception?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)