You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@zookeeper.apache.org by "Heller, George A III CTR (USA)" <ge...@mail.mil.INVALID> on 2022/07/01 13:10:27 UTC

Why do Zookeeper nodes only rejoin the ensemble if they are addressed as 0.0.0.0 in zoo.cfg

We have two Zookeeper clusters of 3 servers each.

On one of the clusters, we have the DNS names for all three servers and it works fine. On the second cluster, A node will not rejoin the ensemble after being restarted unless I put 0.0.0.0 as the address of the current node.

I stop then later restart the Zookeeper service on the node which is the leader to simulate server failures.

 

Why do I need to put 0.0.0.0 for the address of the current node in zoo.cfg for one group of servers, but not the other? Is there anything I can do differently to solve this problem?

The only issue we have is the below error message in the zk_status section of the Solr admin website, everything else works correctly as long as the current server's address is 0.0.0.0



 

Below shows what my zoo.cfg files look like.

 

-- Node1 zoo.cfg --

tickTime=2000

dataDir=I:/zookeeper/data

clientPort=2181

4lw.commands.whitelist=mntr,conf,ruok

initLimit=5

syncLimit=2

server.1=0.0.0.0:2888:3888

server.2=NODE2DNSNAME.local:2888:3888

server.3=NODE3DNSNAME.local:2888:3888

 

-- Node2 zoo.cfg --

tickTime=2000

dataDir=I:/zookeeper/data

clientPort=2181

4lw.commands.whitelist=mntr,conf,ruok

initLimit=5

syncLimit=2

server.1=NODE1DNSNAME.local:2888:3888

server.2=0.0.0.0:2888:3888

server.3=NODE3DNSNAME.local:2888:3888

 

-- Node3 zoo.cfg --

tickTime=2000

dataDir=I:/zookeeper/data

clientPort=2181

4lw.commands.whitelist=mntr,conf,ruok

initLimit=5

syncLimit=2

server.1=NODE1DNSNAME.local:2888:3888

server.2=NODE2DNSNAME.local:2888:3888

server.3=0.0.0.0:2888:3888

Re: Why do Zookeeper nodes only rejoin the ensemble if they are addressed as 0.0.0.0 in zoo.cfg

Posted by Aishwarya Soni <ai...@gmail.com>.

Hi,

0.0.0.0 helps the node to resolve itself and register their node/server
address in the DNS which is then used by other servers in the zoo.cfg file
to form a quorum.

While coming up initially, its trying to look for all the nodes in the
zoo.cfg and if we don't give 0.0.0.0, it will try to resolve itself with a
DNS name which still does not exist. This happens within all the nodes
which helps to form a quorum.

The solr error you can ignore for now. Its a UI bug to support dynamic
config introduced in zookeeper which will be resolved soon. It does not
impact the functionality.

One question I had was why are you defining the server configuration in the
zoo.cfg instead of zoo.cfg.dynamic file if you are using dynamic
configuration introduced in the zookeeper 3.5+ version?

Regards,
Aishwarya Soni

On Fri, Jul 1, 2022, 6:11 AM Heller, George A III CTR (USA)
<ge...@mail.mil.invalid> wrote:

> We have two Zookeeper clusters of 3 servers each.
>
> On one of the clusters, we have the DNS names for all three servers and it
> works fine. On the second cluster, A node will not rejoin the ensemble
> after being restarted unless I put 0.0.0.0 as the address of the current
> node.
>
> I stop then later restart the Zookeeper service on the node which is the
> leader to simulate server failures.
>
>
>
> Why do I need to put 0.0.0.0 for the address of the current node in
> zoo.cfg for one group of servers, but not the other? Is there anything I
> can do differently to solve this problem?
>
> The only issue we have is the below error message in the zk_status section
> of the Solr admin website, everything else works correctly as long as the
> current server’s address is 0.0.0.0
>
>
>
> Below shows what my zoo.cfg files look like.
>
>
>
> *-- Node1 zoo.cfg --*
>
> tickTime=2000
>
> dataDir=I:/zookeeper/data
>
> clientPort=2181
>
> 4lw.commands.whitelist=mntr,conf,ruok
>
> initLimit=5
>
> syncLimit=2
>
> server.1=0.0.0.0:2888:3888
>
> server.2=NODE2DNSNAME.local:2888:3888
>
> server.3=NODE3DNSNAME.local:2888:3888
>
>
>
> *-- Node2 zoo.cfg --*
>
> tickTime=2000
>
> dataDir=I:/zookeeper/data
>
> clientPort=2181
>
> 4lw.commands.whitelist=mntr,conf,ruok
>
> initLimit=5
>
> syncLimit=2
>
> server.1=NODE1DNSNAME.local:2888:3888
>
> server.2=0.0.0.0:2888:3888
>
> server.3=NODE3DNSNAME.local:2888:3888
>
>
>
> *-- Node3 zoo.cfg --*
>
> tickTime=2000
>
> dataDir=I:/zookeeper/data
>
> clientPort=2181
>
> 4lw.commands.whitelist=mntr,conf,ruok
>
> initLimit=5
>
> syncLimit=2
>
> server.1=NODE1DNSNAME.local:2888:3888
>
> server.2=NODE2DNSNAME.local:2888:3888
>
> server.3=0.0.0.0:2888:3888
>

Re: Why do Zookeeper nodes only rejoin the ensemble if they are addressed as 0.0.0.0 in zoo.cfg

Posted by Aishwarya Soni <ai...@gmail.com>.

Hi,

0.0.0.0 helps the node to resolve itself and register their node/server
address in the DNS which is then used by other servers in the zoo.cfg file
to form a quorum.

While coming up initially, its trying to look for all the nodes in the
zoo.cfg and if we don't give 0.0.0.0, it will try to resolve itself with a
DNS name which still does not exist. This happens within all the nodes
which helps to form a quorum.

The solr error you can ignore for now. Its a UI bug to support dynamic
config introduced in zookeeper which will be resolved soon. It does not
impact the functionality.

One question I had was why are you defining the server configuration in the
zoo.cfg instead of zoo.cfg.dynamic file if you are using dynamic
configuration introduced in the zookeeper 3.5+ version?

Regards,
Aishwarya Soni

On Fri, Jul 1, 2022, 6:11 AM Heller, George A III CTR (USA)
<ge...@mail.mil.invalid> wrote:

> We have two Zookeeper clusters of 3 servers each.
>
> On one of the clusters, we have the DNS names for all three servers and it
> works fine. On the second cluster, A node will not rejoin the ensemble
> after being restarted unless I put 0.0.0.0 as the address of the current
> node.
>
> I stop then later restart the Zookeeper service on the node which is the
> leader to simulate server failures.
>
>
>
> Why do I need to put 0.0.0.0 for the address of the current node in
> zoo.cfg for one group of servers, but not the other? Is there anything I
> can do differently to solve this problem?
>
> The only issue we have is the below error message in the zk_status section
> of the Solr admin website, everything else works correctly as long as the
> current server’s address is 0.0.0.0
>
>
>
> Below shows what my zoo.cfg files look like.
>
>
>
> *-- Node1 zoo.cfg --*
>
> tickTime=2000
>
> dataDir=I:/zookeeper/data
>
> clientPort=2181
>
> 4lw.commands.whitelist=mntr,conf,ruok
>
> initLimit=5
>
> syncLimit=2
>
> server.1=0.0.0.0:2888:3888
>
> server.2=NODE2DNSNAME.local:2888:3888
>
> server.3=NODE3DNSNAME.local:2888:3888
>
>
>
> *-- Node2 zoo.cfg --*
>
> tickTime=2000
>
> dataDir=I:/zookeeper/data
>
> clientPort=2181
>
> 4lw.commands.whitelist=mntr,conf,ruok
>
> initLimit=5
>
> syncLimit=2
>
> server.1=NODE1DNSNAME.local:2888:3888
>
> server.2=0.0.0.0:2888:3888
>
> server.3=NODE3DNSNAME.local:2888:3888
>
>
>
> *-- Node3 zoo.cfg --*
>
> tickTime=2000
>
> dataDir=I:/zookeeper/data
>
> clientPort=2181
>
> 4lw.commands.whitelist=mntr,conf,ruok
>
> initLimit=5
>
> syncLimit=2
>
> server.1=NODE1DNSNAME.local:2888:3888
>
> server.2=NODE2DNSNAME.local:2888:3888
>
> server.3=0.0.0.0:2888:3888
>