You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@curator.apache.org by "J Robert Ray (Jira)" <ji...@apache.org> on 2020/08/26 01:44:00 UTC
[jira] [Commented] (CURATOR-578) EnsembleTracker replace hostname connectString with wrong ip from zk config

    [ https://issues.apache.org/jira/browse/CURATOR-578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17184838#comment-17184838 ] 

J Robert Ray commented on CURATOR-578:
--------------------------------------

I am experiencing a combination of this problem and Curator eventually attempting to attempt to connect to 0.0.0.0 as in CURATOR-392; the fix for that does not handle the configuration scenario suggested by the [official Zookeeper Docker image|https://hub.docker.com/_/zookeeper] for deploying to Docker Swarm, specifically, using "0.0.0.0" as the bind address.

I have a deployment of three Zookeeper nodes in Docker Swarm, and have attempted to give them stable IPs by pinning each container to a dedicated node and using the node hostname when advertising the service:

{{ZOO_SERVERS: server.1=0.0.0.0:2888:3888;2181 server.2=host2:2888:3888;host2:2181 server.3=host3:2888:3888;host3:2181}}
{{ ZOO_SERVERS: server.1=host1:2888:3888;host1:2181 server.2=0.0.0.0:2888:3888;2181 server.3=host3:2888:3888;host3:2181}}
{{ ZOO_SERVERS: server.1=host1:2888:3888;host1:2181 server.2=host2:2888:3888;host2:2181 server.3=0.0.0.0:2888:3888;2181}}

 

Curator is configured initially with the connection string: {{host1:2181,host2:2181,host3:2183}}. Things are fine until a Zookeeper node is restarted for some reason.

 

This log is from the application using Curator, at the moment Zookeeper is killed on host3. The addresses 10.5.x.x are the valid IP addresses for the Docker hosts. The addresses 10.0.x.x are the Docker Swarm node internal addresses, which change upon Zookeeper restarting.

[^curator.log]

The client ends up in a loop trying to connect to 10.0.x.x addresses, which may no longer be valid, and 0.0.0.0.

 

Apart from this, my Zookeeper cluster does not reliably recover from a node restart without manually stopping all but one node (for example, the restarted node rejecting new connections because client has a higher zxid), which is making me reconsider attempting to run the cluster with Swarm/k8s.

Curator 5.1.0, Zookeeper 3.6.1.

> EnsembleTracker replace hostname connectString with wrong ip from zk config
> ---------------------------------------------------------------------------
>
>                 Key: CURATOR-578
>                 URL: https://issues.apache.org/jira/browse/CURATOR-578
>             Project: Apache Curator
>          Issue Type: Bug
>          Components: Client
>    Affects Versions: 4.0.1
>            Reporter: ying.li
>            Priority: Major
>         Attachments: curator.log
>
>
> I have a zookeeper cluster  which run on a k8s cluster. and I use host name to  connect the zookeeper(like : zookeeper-0.zookeeper-headless.default.svc.cluster.local:2181,zookeeper-1.zookeeper-headless.default.svc.cluster.local:2181,zookeeper-2.zookeeper-headless.default.svc.cluster.local:2181).
>  
> When the zookeeper restart. the zk pod's ip will change.  then  I find my client will use the IP to recreate a client without using the hostname . but the IP is not the latest IP from hostname.so, it will make client never connect to zk , unless restart the client
>  
> After some debug ,I find the EnsembleTracker will change the connectString from hostname to ip when receive the congfig change  event. But in many case, the IP get from hostname will not change after  zk  restart in k8s. so, it will make client never connect to zk , unless restart the client
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)