You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@zookeeper.apache.org by "Cesar Stuardo (JIRA)" <ji...@apache.org> on 2017/09/01 19:31:00 UTC

[jira] [Commented] (ZOOKEEPER-2865) Reconfig Causes Inconsistent Configuration file among the nodes

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-2865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16151051#comment-16151051 ] 

Cesar Stuardo commented on ZOOKEEPER-2865:
------------------------------------------

Hello Alexander,

In your first comment, you state that
----
But what's required is for the cluster to be able to recover from this state - the server that didn't get the commit in your scenario should find out about the new config and eventually join the cluster. If that doesn't happen then that potentially is a bug, but its not clear from the description here.
----

What do you mean by this? In our scenario, the node wont be able to recover since the nodes that it knows at startup are not listening in the same ports anymore, thus wont get updated. The only solution is admin intervention.


> Reconfig Causes Inconsistent Configuration file among the nodes
> ---------------------------------------------------------------
>
>                 Key: ZOOKEEPER-2865
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2865
>             Project: ZooKeeper
>          Issue Type: Improvement
>          Components: documentation
>    Affects Versions: 3.5.3
>            Reporter: Jeffrey F. Lukman
>            Assignee: Alexander Shraer
>            Priority: Trivial
>             Fix For: 3.5.4, 3.6.0
>
>         Attachments: ZK-2865.pdf
>
>
> When we run our Distributed system Model Checking (DMCK) in ZooKeeper v3.5.3
> by following the workload in ZK-2778:
> - initially start 2 ZooKeeper nodes
> - start 3 new nodes
> - do a reconfiguration (the complete reconfiguration is attached in the document)
> We think our DMCK found this following bug:
> - while one of the just joined nodes has not received the latest configuration update 
> (called as node X), the initial leader node closed its port, 
> therefore causing the node X to be isolated.
> For complete information of the bug, please see the document that is attached.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)