You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2019/03/07 08:08:00 UTC

[jira] [Commented] (KAFKA-7974) KafkaAdminClient loses worker thread/enters zombie state when initial DNS lookup fails

    [ https://issues.apache.org/jira/browse/KAFKA-7974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16786494#comment-16786494 ] 

ASF GitHub Bot commented on KAFKA-7974:
---------------------------------------

cmccabe commented on pull request #6305: Fix for KAFKA-7974: Avoid zombie AdminClient when node host isn't resolvable
URL: https://github.com/apache/kafka/pull/6305
 
 
   
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> KafkaAdminClient loses worker thread/enters zombie state when initial DNS lookup fails
> --------------------------------------------------------------------------------------
>
>                 Key: KAFKA-7974
>                 URL: https://issues.apache.org/jira/browse/KAFKA-7974
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Nicholas Parker
>            Priority: Major
>
> Version: kafka-clients-2.1.0
> I have some code that creates creates a KafkaAdminClient instance and then invokes listTopics(). I was seeing the following stacktrace in the logs, after which the KafkaAdminClient instance became unresponsive:
> {code:java}
> ERROR [kafka-admin-client-thread | adminclient-1] 2019-02-18 01:00:45,597 KafkaThread.java:51 - Uncaught exception in thread 'kafka-admin-client-thread | adminclient-1':
> java.lang.IllegalStateException: No entry found for connection 0
>     at org.apache.kafka.clients.ClusterConnectionStates.nodeState(ClusterConnectionStates.java:330)
>     at org.apache.kafka.clients.ClusterConnectionStates.disconnected(ClusterConnectionStates.java:134)
>     at org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:921)
>     at org.apache.kafka.clients.NetworkClient.ready(NetworkClient.java:287)
>     at org.apache.kafka.clients.admin.KafkaAdminClient$AdminClientRunnable.sendEligibleCalls(KafkaAdminClient.java:898)
>     at org.apache.kafka.clients.admin.KafkaAdminClient$AdminClientRunnable.run(KafkaAdminClient.java:1113)
>     at java.lang.Thread.run(Thread.java:748){code}
> From looking at the code I was able to trace down a possible cause:
>  * NetworkClient.ready() invokes this.initiateConnect() as seen in the above stacktrace
>  * NetworkClient.initiateConnect() invokes ClusterConnectionStates.connecting(), which internally invokes ClientUtils.resolve() to to resolve the host when creating an entry for the connection.
>  * If this host lookup fails, a UnknownHostException can be thrown back to NetworkClient.initiateConnect() and the connection entry is not created in ClusterConnectionStates. This exception doesn't get logged so this is a guess on my part.
>  * NetworkClient.initiateConnect() catches the exception and attempts to call ClusterConnectionStates.disconnected(), which throws an IllegalStateException because no entry had yet been created due to the lookup failure.
>  * This IllegalStateException ends up killing the worker thread and KafkaAdminClient gets stuck, never returning from listTopics().



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)