You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@zookeeper.apache.org by "Vikramark (Jira)" <ji...@apache.org> on 2020/11/08 02:07:00 UTC

[jira] [Commented] (ZOOKEEPER-3997) Why Zookeeper 3.5.8 leader shutdown makes the follower not allow new sessions?

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-3997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17227895#comment-17227895 ] 

Vikramark commented on ZOOKEEPER-3997:
--------------------------------------

This issue was similar to ZOOKEEPER-2669.

This was resolved as suggested in the comments section of above issue, by not providing 0.0.0.0 in the server list but by providing FQDN. 

> Why Zookeeper 3.5.8 leader shutdown makes the follower not allow new sessions?
> ------------------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-3997
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3997
>             Project: ZooKeeper
>          Issue Type: Bug
>            Reporter: Vikramark
>            Priority: Major
>             Fix For: 3.5.8
>
>
> There is a behavior difference between zookeeper 3.4.14 and 3.5.8.  In 3.5.8 given a cluster of 3 nodes, when the leader   node is shutdown, the follower does not seem to be in sync with the new leader. The clients are not able to establish session and requests gets queued up as outstanding requests.  We don't see this issue in case of 3.4.14 version.
> Below are the details of how to recreate the issue:
>  
> *With version 3.5.8:*
>  
> Initial fresh setup of 3 node cluster:
>   
> |Zoo1|Zoo2|Zoo3|
> |Zookeeper version: 3.5.8-f439ca583e70862c3068a1f2a7d4d068eec33315, built on 05/04/2020 15:07 GMT
>  Latency min/avg/max: 0/0/0
>  Received: 3
>  Sent: 2
>  Connections: 1
>  *Outstanding: 0*
>  *Zxid: 0x0*
>  Mode: follower
>  Node count: 5|Zookeeper version: 3.5.8-f439ca583e70862c3068a1f2a7d4d068eec33315, built on 05/04/2020 15:07 GMT
>  Latency min/avg/max: 0/0/0
>  Received: 3
>  Sent: 2
>  Connections: 1
>  Outstanding: 0
>  *Zxid: 0x100000000*
>  *Mode: leader*
>  Node count: 5
>  Proposal sizes last/min/max: -1/-1/-1|Zookeeper version: 3.5.8-f439ca583e70862c3068a1f2a7d4d068eec33315, built on 05/04/2020 15:07 GMT
>  Latency min/avg/max: 0/0/0
>  Received: 2
>  Sent: 1
>  Connections: 1
>  *Outstanding: 0*
>  *Zxid: 0x100000000*
>  *Mode: follower*
>  Node count: 5|
>  
> After starting one session using zkCli.sh on Zoo1 node:
> |Zoo1|Zoo2|Zoo3|
> |Zookeeper version: 3.5.8-f439ca583e70862c3068a1f2a7d4d068eec33315, built on 05/04/2020 15:07 GMT
>  Latency min/avg/max: 1/9/23
>  Received: 7
>  Sent: 6
>  Connections: 2
>  Outstanding: 0
>  *Zxid: 0x100000001*
>  *Mode: follower*
>  Node count: 5|Zookeeper version: 3.5.8-f439ca583e70862c3068a1f2a7d4d068eec33315, built on 05/04/2020 15:07 GMT
>  Latency min/avg/max: 0/0/0
>  Received: 4
>  Sent: 3
>  Connections: 1
>  Outstanding: 0
>  *Zxid: 0x100000001*
>  *Mode: leader*
>  Node count: 5
>  Proposal sizes last/min/max: 36/36/36|Zookeeper version: 3.5.8-f439ca583e70862c3068a1f2a7d4d068eec33315, built on 05/04/2020 15:07 GMT
>  Latency min/avg/max: 0/0/0
>  Received: 3
>  Sent: 2
>  Connections: 1
>  Outstanding: 0
>  *Zxid: 0x100000001*
>  *Mode: follower*
>  Node count: 5|
>  
> +Note: We can see that Zxid is now consistent across all nodes.+
>  
> After shutting down leader node zoo2. We can see ZOO3 became the Leader. For some reason the ZXID is not same between zoo1 and zoo3.
>  
> Start a new zkCli.sh session on same node (zoo1).  *The session was not established, the cli client just keeps retrying and created many outstanding requests on zoo1.* 
> *Expected behavior:* The zoo1 should have same zxid as zoo3. The client session should be allowed to be created. No outstanding requests should be added. 
>  
> |Zoo1|Zoo2|Zoo3|
> |Zookeeper version: 3.5.8-f439ca583e70862c3068a1f2a7d4d068eec33315, built on 05/04/2020 15:07 GMT
>  Latency min/avg/max: 0/0/2
>  Received: 50
>  Sent: 43
>  Connections: 2
>  *{color:#de350b}Outstanding: 6{color}*
>  *{color:#de350b}Zxid: 0x100000001{color}*
>  Mode: follower
>  Node count: 5|down|Zookeeper version: 3.5.8-f439ca583e70862c3068a1f2a7d4d068eec33315, built on 05/04/2020 15:07 GMT
>  Latency min/avg/max: 0/0/0
>  Received: 1
>  Sent: 0
>  Connections: 1
>  Outstanding: 0
>  *Zxid: 0x200000000*
>  *Mode: leader*
>  Node count: 5
>  Proposal sizes last/min/max: -1/-1/-1|
>  
> *With version 3.4.14*
> First initial setup:
>  
> |Zoo1|Zoo2|Zoo3|
> |Zookeeper version: 3.4.14-4c25d480e66aadd371de8bd2fd8da255ac140bcf, built on 03/06/2019 16:18 GMT
>  Latency min/avg/max: 0/0/0
>  Received: 1
>  Sent: 0
>  Connections: 1
>  Outstanding: 0
>  Zxid: 0x0
>  Mode: follower
>  Node count: 4|Zookeeper version: 3.4.14-4c25d480e66aadd371de8bd2fd8da255ac140bcf, built on 03/06/2019 16:18 GMT
>  Latency min/avg/max: 0/0/0
>  Received: 1
>  Sent: 0
>  Connections: 1
>  Outstanding: 0
>  Zxid: 0x100000000
>  Mode: leader
>  Node count: 4
>  Proposal sizes last/min/max: -1/-1/-1|Zookeeper version: 3.4.14-4c25d480e66aadd371de8bd2fd8da255ac140bcf, built on 03/06/2019 16:18 GMT
>  Latency min/avg/max: 0/0/0
>  Received: 1
>  Sent: 0
>  Connections: 1
>  Outstanding: 0
>  Zxid: 0x100000000
>  Mode: follower
>  Node count: 4|
>  
> After connecting with zkCli on ZOO1.
>  
>  
> |Zoo1|Zoo2|Zoo3|
> |Zookeeper version: 3.4.14-4c25d480e66aadd371de8bd2fd8da255ac140bcf, built on 03/06/2019 16:18 GMT
>  Latency min/avg/max: 0/14/33
>  Received: 5
>  Sent: 4
>  Connections: 2
>  Outstanding: 0
>  Zxid: 0x100000001
>  Mode: follower
>  Node count: 4|Zookeeper version: 3.4.14-4c25d480e66aadd371de8bd2fd8da255ac140bcf, built on 03/06/2019 16:18 GMT
>  Latency min/avg/max: 0/0/0
>  Received: 2
>  Sent: 1
>  Connections: 1
>  Outstanding: 0
>  Zxid: 0x100000001
>  Mode: leader
>  Node count: 4
>  Proposal sizes last/min/max: 36/36/36|Zookeeper version: 3.4.14-4c25d480e66aadd371de8bd2fd8da255ac140bcf, built on 03/06/2019 16:18 GMT
>  Latency min/avg/max: 0/0/0
>  Received: 2
>  Sent: 1
>  Connections: 1
>  Outstanding: 0
>  Zxid: 0x100000001
>  Mode: follower
>  Node count: 4|
>  
> +Note: The zkid is now same for all the nodes.+
>  
>  
> After shutting down leader node zoo2. We can see ZOO3 became the Leader. For some reason the ZXID is not same between zoo1 and zoo3 initially. ZOO3 has new zkid as a new epoch was created but zoo1 still had old zkid.
>  
> Now closed the existing zxcli and started a new zkCli.sh session on same node (zoo1).  *This time session was established!.*
>  
> |Zoo1|Zoo2|Zoo3|
> |Zookeeper version: 3.4.14-4c25d480e66aadd371de8bd2fd8da255ac140bcf, built on 03/06/2019 16:18 GMT
>  Latency min/avg/max: 0/1/4
>  Received: 8
>  Sent: 7
>  Connections: 2
>  *{color:#00875a}Outstanding: 0{color}*
>  *{color:#00875a}Zxid: 0x200000001{color}*
>  Mode: follower
>  Node count: 4|down
>   |Zookeeper version: 3.4.14-4c25d480e66aadd371de8bd2fd8da255ac140bcf, built on 03/06/2019 16:18 GMT
>  Latency min/avg/max: 0/0/0
>  Received: 3
>  Sent: 2
>  Connections: 1
>  *Outstanding: 0*
>  *Zxid: 0x200000001*
>  Mode: leader
>  Node count: 4
>  Proposal sizes last/min/max: 36/36/36|
>  
>  
>  
>  
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)