You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "David Glasser (JIRA)" <ji...@apache.org> on 2018/05/03 01:28:00 UTC

[jira] [Commented] (KAFKA-5473) handle ZK session expiration properly when a new session can't be established

    [ https://issues.apache.org/jira/browse/KAFKA-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16461823#comment-16461823 ] 

David Glasser commented on KAFKA-5473:
--------------------------------------

KAFKA-4041 claims that this fix, which is in 1.1, should resolve the issue where Kafka never notices if the DNS resolution of the zk server name changes (eg, if it's in a k8s service).  We just upgraded our staging cluster to 1.1 and tested to see if that resolved our issue: we replaced each ZK pod (causing each DNS name to change its resolution) and looked to see if brokers would successfully reconnect. (We also set sun.net.inetaddr.ttl=10 though we don't think that's necessarily.) Unfortunately they did not; we still ended up with lots of error logs like

 

[2018-05-03 01:25:56,196] INFO Opening socket connection to server zk-1.zk.staging.svc.cluster.local/10.48.33.20:2181. Will not attempt to authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn)
[2018-05-03 01:25:58,044] WARN Session 0x2632241e9350025 for server null, unexpected error, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn)
java.net.NoRouteToHostException: No route to host
 at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
 at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
 at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
 at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1141)

where the IP address in the logs was the old one.

KAFKA-4041 did link to ZOOKEEPER-2184 which is still open, but its final comment claimed that the ZK fix shouldn't be needed because KAFKA-5473 re-creates the zk client on each attempt. Is that true?

> handle ZK session expiration properly when a new session can't be established
> -----------------------------------------------------------------------------
>
>                 Key: KAFKA-5473
>                 URL: https://issues.apache.org/jira/browse/KAFKA-5473
>             Project: Kafka
>          Issue Type: Sub-task
>    Affects Versions: 0.9.0.0
>            Reporter: Jun Rao
>            Assignee: Prasanna Gautam
>            Priority: Major
>             Fix For: 1.1.0
>
>
> In https://issues.apache.org/jira/browse/KAFKA-2405, we change the logic in handling ZK session expiration a bit. If a new ZK session can't be established after session expiration, we just log an error and continue. However, this can leave the broker in a bad state since it's up, but not registered from the controller's perspective. Replicas on this broker may never to be in sync.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)