You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@zookeeper.apache.org by "Mate Szalay-Beko (Jira)" <ji...@apache.org> on 2020/02/19 14:37:00 UTC

[jira] [Comment Edited] (ZOOKEEPER-3725) Zookeeper fails to establish quorum with 2 servers using 3.5.6

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-3725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17040098#comment-17040098 ] 

Mate Szalay-Beko edited comment on ZOOKEEPER-3725 at 2/19/20 2:36 PM:
----------------------------------------------------------------------

Actually in most of the case the 0.0.0.0 is not necessarily needed, although it is used in blog posts or other descriptions. (it is depending on your actual docker networking setup). Can you maybe also try with the following config everywhere?

{{ZOO_SERVERS: server.1=orchestrator1.cameltest.int:2888:3888 server.2=orchestrator2.cameltest.int:2888:3888}}

If you for some reason still need to bind locally to all addresses (this is what 0.0.0.0 does), then you can try to add {{quorumListenOnAllIPs=true}} to the config. This should have the same effect without having different server addresses configured for the different nodes.


was (Author: symat):
Actually in most of the case the 0.0.0.0 is not necessarily needed, although it is used in blog posts or other places. (it is depending on your actual docker networking setup). Can you try maybe also try with the following config everywhere?
{{ZOO_SERVERS: server.1=orchestrator1.cameltest.int:2888:3888 server.2=orchestrator2.cameltest.int:2888:3888}}

If you for some reason still need to bind to all addresses, then you can try to add {{quorumListenOnAllIPs=true}} to the config.

> Zookeeper fails to establish quorum with 2 servers using 3.5.6
> --------------------------------------------------------------
>
>                 Key: ZOOKEEPER-3725
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3725
>             Project: ZooKeeper
>          Issue Type: Bug
>    Affects Versions: 3.5.6
>            Reporter: Antoine DESSAIGNE
>            Priority: Major
>         Attachments: failure-3.5.6.txt, success-3.4.14.txt, success-3.5.6.txt
>
>
> Hello everyone,
> We noticed that with Zookeeper 3.5.6, it fails to establish quorum on a new deployment on a regular basis (approx 50% of the time)
> We were able to reduce the reproduction steps to the bare minimum we could. Consider the following docker-compose.yml file
> {noformat}
> version: '2'
> services:
>   orchestrator1.cameltest.int:
>     image: zookeeper:3.5.6
>     environment:
>       ZOO_MY_ID: 1
>       ZOO_SERVERS: server.1=0.0.0.0:2888:3888 server.2=orchestrator2.cameltest.int:2888:3888
>   orchestrator2.cameltest.int:
>     image: zookeeper:3.5.6
>     environment:
>       ZOO_MY_ID: 2
>       ZOO_SERVERS: server.1=orchestrator1.cameltest.int:2888:3888 server.2=0.0.0.0:2888:3888
> {noformat}
> When launching a brand new cluster with it (with {{docker-compose up}}, no previous data) it fails half of the time with 3.5.6 and never in 3.4.14.
> You'll find attached 3 logs:
> * a failure one using 3.5.6
> * a success one using 3.5.6
> * a success one 3.4.14
> I don't think it's related to some docker/docker-compose issue (as it's working using 3.4.14 on the same server)
> I'll try to check each intermediate release to pin a more specific version.
> Unfortunately, I don't know yet my way in the Zookeeper code, what can I do to help? Thanks!
> PS: Yes, it's strange to have 2 servers as they're both required to work, but it's the smallest repro-case



--
This message was sent by Atlassian Jira
(v8.3.4#803005)