You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by "Patrick Hunt (JIRA)" <ji...@apache.org> on 2008/07/29 23:03:31 UTC

[jira] Created: (ZOOKEEPER-107) Allow dynamic changes to server cluster membership

Allow dynamic changes to server cluster membership
--------------------------------------------------

                 Key: ZOOKEEPER-107
                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107
             Project: Zookeeper
          Issue Type: Improvement
          Components: server
            Reporter: Patrick Hunt


Currently cluster membership is statically defined, adding/removing hosts to/from the server cluster dynamically needs to be supported.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Re: [jira] Commented: (ZOOKEEPER-107) Allow dynamic changes to server cluster membership

Posted by Vishal K <vi...@gmail.com>.
Hi Andraz,

I am quite keen to implement this myself. We need this for our project as
well.  Your environment certainly seems more dynamic.

Unfortunately, I haven't been able to find time for implementing this yet
due to some immediate project deadlines. While I won't be able to work on
this full-time, I am hoping that I will be able to invest size able amount
of time in this after another 10-15 days or so. Thanks.

Regards,
-Vishal

On Thu, May 20, 2010 at 6:36 AM, Andraz Tori (JIRA) <ji...@apache.org> wrote:

>
>    [
> https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12869554#action_12869554]
>
> Andraz Tori commented on ZOOKEEPER-107:
> ---------------------------------------
>
> Has anything happened with this feature?
>
> There was some talk about what the most important use cases are on the
> mailing list. We're thinking of migrating home-grown solution to Zookeeper,
> but can't do it without dynamic addition/removal of the servers. If it
> helps, here's the use case:
>
> We're having fully cloudy solution. Every server that we put into the
> cluster runs a set of services that make themselves available to a local
> "resource manager" that shares the list of resources with all other servers
> in the cluster. When we do upgrades we simply fire up new servers with new
> versions of the services and connect their resource managers to the old ones
> into the same cluster. Then we simply shut down the old servers. Beside
> adding/removing servers when upgrading, we also do the same thing when we
> need to temporarily scale - we fire up a few more servers and connect their
> resource managers to the cluster to make the services available to the
> cluster.
> We never know how many servers there are going to be in the cluster and we
> don't assign any dns entries to them (just another point of failure).
> The clients that need to know about resources connect to any of the
> "resource managers" and get a list of all resources available and also about
> other "resource managers". As servers move around they also can connect to
> different "resource manager".
>
> This is a bit unusual configuration since cloud practically lives on its
> own without any kind of static addresses. As long as you are able to connect
> to it at one point in time, you can keep up with it 'motion'.
>
> So the idea was to migrate the above system to Zookeeper. Every service
> would connect to local Zookeeper and create ephemeral node announcing it. So
> every server would run its own Zookeeper node connected to the Zookeeper
> cloud. However without dynamic addition/removal of the servers all this
> becomes infeasible.
>
> Ideally we'd like to have a situation where we just start a Zookeeper node
> by giving it a list of known other Zookeeper nodes in the cloud. And then it
> should take on to the life of its own.
>
> Hope that the use case helps. I am really looking forward to this!
>
>
>
>  > Allow dynamic changes to server cluster membership
> > --------------------------------------------------
> >
> >                 Key: ZOOKEEPER-107
> >                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107
> >             Project: Zookeeper
> >          Issue Type: Improvement
> >          Components: server
> >            Reporter: Patrick Hunt
> >            Assignee: Henry Robinson
> >         Attachments: SimpleAddition.rtf
> >
> >
> > Currently cluster membership is statically defined, adding/removing hosts
> to/from the server cluster dynamically needs to be supported.
>
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>
>

[jira] Commented: (ZOOKEEPER-107) Allow dynamic changes to server cluster membership

Posted by "Henry Robinson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12722368#action_12722368 ] 

Henry Robinson commented on ZOOKEEPER-107:
------------------------------------------

I think the issue of how to locate an ensemble whose makeup has changed needs to be discussed separately. I've got an idea for how I'd suggest doing it, but will leave that until I've got the view change stuff working. Once a new leader has been elected, it will need to publish this somewhere (probably both internal to ZK in /zookeeper/ensemble and externally). Observers can use one of those routes to find the leader.

At the moment, Observers are just followers that a) can't make most mutable proposals b) don't get either PROPOSE or COMMIT messages, just INFORM ones with the payload and c) can propose view changes, not necessarily to include themselves. So an Observer attaches to a leader, syncs and maybe listens in on the proposal stream for a while and then upgrades itself by issuing a view change request.



> Allow dynamic changes to server cluster membership
> --------------------------------------------------
>
>                 Key: ZOOKEEPER-107
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107
>             Project: Zookeeper
>          Issue Type: Improvement
>          Components: server
>            Reporter: Patrick Hunt
>            Assignee: Henry Robinson
>         Attachments: SimpleAddition.rtf
>
>
> Currently cluster membership is statically defined, adding/removing hosts to/from the server cluster dynamically needs to be supported.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (ZOOKEEPER-107) Allow dynamic changes to server cluster membership

Posted by "Quinton Hoole (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12851426#action_12851426 ] 

Quinton Hoole commented on ZOOKEEPER-107:
-----------------------------------------

OK, not to worry.  I just found the answer to my question here:

http://www.mail-archive.com:80/zookeeper-dev@hadoop.apache.org/msg07382.html


> Allow dynamic changes to server cluster membership
> --------------------------------------------------
>
>                 Key: ZOOKEEPER-107
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107
>             Project: Zookeeper
>          Issue Type: Improvement
>          Components: server
>            Reporter: Patrick Hunt
>            Assignee: Henry Robinson
>         Attachments: SimpleAddition.rtf
>
>
> Currently cluster membership is statically defined, adding/removing hosts to/from the server cluster dynamically needs to be supported.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (ZOOKEEPER-107) Allow dynamic changes to server cluster membership

Posted by "Benjamin Reed (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12706097#action_12706097 ] 

Benjamin Reed commented on ZOOKEEPER-107:
-----------------------------------------

i agree with everything you are saying and yes to all the questions. it's not as strange as it sounds. today we have to pre-populate the cluster config. it would just be that now rather than creating a file with vi we would need to use a utility to create an initial snapshot that has the config in it. i think this would also help with some deployment errors by tightly tying the data with the cluster config. the previously mentioned utility would also allow you to avoid having to start with a single node cluster and growing from there.

> Allow dynamic changes to server cluster membership
> --------------------------------------------------
>
>                 Key: ZOOKEEPER-107
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107
>             Project: Zookeeper
>          Issue Type: Improvement
>          Components: server
>            Reporter: Patrick Hunt
>
> Currently cluster membership is statically defined, adding/removing hosts to/from the server cluster dynamically needs to be supported.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (ZOOKEEPER-107) Allow dynamic changes to server cluster membership

Posted by "Andraz Tori (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12869554#action_12869554 ] 

Andraz Tori commented on ZOOKEEPER-107:
---------------------------------------

Has anything happened with this feature?

There was some talk about what the most important use cases are on the mailing list. We're thinking of migrating home-grown solution to Zookeeper, but can't do it without dynamic addition/removal of the servers. If it helps, here's the use case:

We're having fully cloudy solution. Every server that we put into the cluster runs a set of services that make themselves available to a local "resource manager" that shares the list of resources with all other servers in the cluster. When we do upgrades we simply fire up new servers with new versions of the services and connect their resource managers to the old ones into the same cluster. Then we simply shut down the old servers. Beside adding/removing servers when upgrading, we also do the same thing when we need to temporarily scale - we fire up a few more servers and connect their resource managers to the cluster to make the services available to the cluster. 
We never know how many servers there are going to be in the cluster and we don't assign any dns entries to them (just another point of failure).
The clients that need to know about resources connect to any of the "resource managers" and get a list of all resources available and also about other "resource managers". As servers move around they also can connect to different "resource manager".

This is a bit unusual configuration since cloud practically lives on its own without any kind of static addresses. As long as you are able to connect to it at one point in time, you can keep up with it 'motion'. 

So the idea was to migrate the above system to Zookeeper. Every service would connect to local Zookeeper and create ephemeral node announcing it. So every server would run its own Zookeeper node connected to the Zookeeper cloud. However without dynamic addition/removal of the servers all this becomes infeasible.

Ideally we'd like to have a situation where we just start a Zookeeper node by giving it a list of known other Zookeeper nodes in the cloud. And then it should take on to the life of its own.

Hope that the use case helps. I am really looking forward to this!



> Allow dynamic changes to server cluster membership
> --------------------------------------------------
>
>                 Key: ZOOKEEPER-107
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107
>             Project: Zookeeper
>          Issue Type: Improvement
>          Components: server
>            Reporter: Patrick Hunt
>            Assignee: Henry Robinson
>         Attachments: SimpleAddition.rtf
>
>
> Currently cluster membership is statically defined, adding/removing hosts to/from the server cluster dynamically needs to be supported.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (ZOOKEEPER-107) Allow dynamic changes to server cluster membership

Posted by "Hiram Chirino (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12619106#action_12619106 ] 

Hiram Chirino commented on ZOOKEEPER-107:
-----------------------------------------

I personally think that this needs to stay decoupled so that group membership can be controlled via different implementations.  In other words,  I think that the QuorumPeer should not have to have any constructor args for it to know it's peers.  It should persistently store/remember what the list of peers are part of the group since it last started.

Not sure if it makes sense to keep that list in the ZK db or not.

When a node that is not part of a cluster first starts up, it needs to know if it's starting a new cluster or if it is joining an existing cluster.  Therefore, I think the QuorumPeer class needs methods like the following:

{code}
/** 
 * Contacts a ZK server in the cluster, adds this peer to the cluster and gets a listing of the rest of the peers in 
 * the cluster.
 *
 * Optional: is slaveOnly is true, then this peer should never be elected master.
 *
 * Throws an error if this peer is already part of a cluster.
 */ 
void joinCluster( URI server, bool slaveOnly )

/**
 * Starts this peer as the first node in the cluster and makes him the master.
 *
 * Throws an error if this peer is already part of a cluster.
 */
void createCluster()

/**
 * Removes this peer from the peer list maintained by the cluster.
 *
 * Throws an error if this peer is not part of a cluster.
 */
void leaveCluster()

/**
 * Gets a list of peers in the cluser.
 *
 * @return null if not part of a cluster yet.
 */
List<URI> getClusterPeers()
{code}

If methods like the above are available, then an administrator can dynamically manage adding/removing nodes on an existing ZooKeeper cluster.  or some automated agent could do it.  Note that the peer list needs to get replicated to all cluster members and persisted to avoid split brain issues on peer restart.  Operations like joinCluster(), leaveCluster(), getClusterPeers() would block until a master is elected in the cluster.  

Please note the 'nice to have feature' where you have the ability to designate some peers as NOT being eligible to become a master.  This would allow you to support using heterogeneous peers, and enforce only allowing the higher end machines to become the masters.



> Allow dynamic changes to server cluster membership
> --------------------------------------------------
>
>                 Key: ZOOKEEPER-107
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107
>             Project: Zookeeper
>          Issue Type: Improvement
>          Components: server
>            Reporter: Patrick Hunt
>
> Currently cluster membership is statically defined, adding/removing hosts to/from the server cluster dynamically needs to be supported.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (ZOOKEEPER-107) Allow dynamic changes to server cluster membership

Posted by "Jakob Homan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12618885#action_12618885 ] 

Jakob Homan commented on ZOOKEEPER-107:
---------------------------------------

I thought I had along this line was to create an additional constructor that takes a URL rather than string for the host.  The constructor could then access that URL and get the list of servers from there.  So, assuming the URL pointed to an http page, the page returned would just be "hostA:port,hostB:port,etc" and the client could proceed with that information.  Or, the URL could point to a local file if the server membership wasn't expected to change often.  This would eliminate the need for the clients to have any idea ahead of time of where to get the hosts.  Particularly if the information were served via http, this would move server location to being a DNS/virtualhost problem - the DNS for the server-info location could be changed if the server providing the info died.  This would go a ways toward adding a restful interface to the configuration of the cluster.

So, the client would instantiate with zk  new Zookeeper(new URL("http://zkhostinfo:7552"), 1000, this);, receive the list of hosts and work from there.  

This would address (though not solve) the issue of adding servers, as new servers could be added to the list returned to new clients.

As a corollary, the zk servers could of course serve the list of hosts as an optional part of their operation, or this function could be provided by another application, directory-style.  This would allow clients to connect to different clusters on start-up, if the directory could identify and differentiate clients by ip addr or such, and direct them to the appropriate zk group.

If this sounds (reasonable && interesting), I can open a JIRA and work on a patch to add the new constructor and client functionality.

> Allow dynamic changes to server cluster membership
> --------------------------------------------------
>
>                 Key: ZOOKEEPER-107
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107
>             Project: Zookeeper
>          Issue Type: Improvement
>          Components: server
>            Reporter: Patrick Hunt
>
> Currently cluster membership is statically defined, adding/removing hosts to/from the server cluster dynamically needs to be supported.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (ZOOKEEPER-107) Allow dynamic changes to server cluster membership

Posted by "Flavio Paiva Junqueira (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12728589#action_12728589 ] 

Flavio Paiva Junqueira commented on ZOOKEEPER-107:
--------------------------------------------------

I think this discussion is really interesting, but we can we move the discussion on the behavior of the observer to ZOOKEEPER-368? I'll add my comments on the last set of comments there.

> Allow dynamic changes to server cluster membership
> --------------------------------------------------
>
>                 Key: ZOOKEEPER-107
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107
>             Project: Zookeeper
>          Issue Type: Improvement
>          Components: server
>            Reporter: Patrick Hunt
>            Assignee: Henry Robinson
>         Attachments: SimpleAddition.rtf
>
>
> Currently cluster membership is statically defined, adding/removing hosts to/from the server cluster dynamically needs to be supported.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (ZOOKEEPER-107) Allow dynamic changes to server cluster membership

Posted by "Raghu S (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12720313#action_12720313 ] 

Raghu S commented on ZOOKEEPER-107:
-----------------------------------

Henry, et al. thanks for the feedback on my proposal.

To explain a bit more on the proposal -- I felt that keeping the data and configuration separate, like it is today, would keep the implementation simple and non intrusive for core ZK code while ensuring correctness. The issue I see with storing config in a znode is that the joiner needs to participate in NEWVIEW ZAB message even when it is not part of the cluster and it needs to have the latest log before it can commit the NEWVIEW  proposal since it requires writing to the log. At the same time, the leader has to make sure that it blocks all proposals in between syncing the joiner and executing NEWVIEW. I felt this could be too intrusive (may be I shouldn't have worried about this while thinking about a higher level proposal?).  Let me know if my understanding is incorrect.

I don't think my proposal would result in a split brain. I believe there is no need for two phase during changing cluster configuration as long each attempt to modify the configuration generates a new version number and the cluster configuration divergence is reconciled during leader election (peers go with the configuration with the highest version number). Having a two phase is no better, since there is no guarantee that all/majority peers have committed the new configuration and there could be diverged view of cluster configuration during election. Let me know if I am missing something.

I do believe Henry's proposal would work. I don't have a strong preference for how we would like to do this, as long as we get it right.



> Allow dynamic changes to server cluster membership
> --------------------------------------------------
>
>                 Key: ZOOKEEPER-107
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107
>             Project: Zookeeper
>          Issue Type: Improvement
>          Components: server
>            Reporter: Patrick Hunt
>         Attachments: SimpleAddition.rtf
>
>
> Currently cluster membership is statically defined, adding/removing hosts to/from the server cluster dynamically needs to be supported.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (ZOOKEEPER-107) Allow dynamic changes to server cluster membership

Posted by "Benjamin Reed (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12728300#action_12728300 ] 

Benjamin Reed commented on ZOOKEEPER-107:
-----------------------------------------

sorry to jump in late here. rather than adding the inform, why don't we just send the PROPOSE and COMMIT to the Observer as normal, and just make the Observer not send ACKs? That way we change as little code as possible with minimum overhead. It also makes switching from Observer to Follower as easy as turning on the ACKs. I also think Observers should be able to issue proposals. One use case for observers are remote data centers that basically proxy clients that connect to ZooKeeper. This means an Observer is just a Follower that doesn't vote (ACK).

> Allow dynamic changes to server cluster membership
> --------------------------------------------------
>
>                 Key: ZOOKEEPER-107
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107
>             Project: Zookeeper
>          Issue Type: Improvement
>          Components: server
>            Reporter: Patrick Hunt
>            Assignee: Henry Robinson
>         Attachments: SimpleAddition.rtf
>
>
> Currently cluster membership is statically defined, adding/removing hosts to/from the server cluster dynamically needs to be supported.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (ZOOKEEPER-107) Allow dynamic changes to server cluster membership

Posted by "Raghu S (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12730384#action_12730384 ] 

Raghu S commented on ZOOKEEPER-107:
-----------------------------------

Sorry to jump around bit, I thought I will mention this if we haven't already talked about it. How do we plan to deal with a situation when a set of nodes can form a majority but  can't form an ensemble because one or more peers have a grossly outdated configuration? Say an ensemble of ABCDE moved to EFGHI while E was offline and only EFG are up? They form a majority but can't form an ensemble since E doesn't know about any of the other servers yet?

One way to address this is to implement an out of band synchronization mechanism in which E will realize that the ensemble has changed when F and G try to connect to E and have one them synchronize E's logs since their last know zxids are ahead of E's. E can then attempt to restart an election. Also, it is possible that F and G could see different ensembles (F is a bit out dated, G is the most up to date), in which case E might first sync up form F and then both E and F sync up form G if G comes online a bit later.

Any simpler solutions?

> Allow dynamic changes to server cluster membership
> --------------------------------------------------
>
>                 Key: ZOOKEEPER-107
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107
>             Project: Zookeeper
>          Issue Type: Improvement
>          Components: server
>            Reporter: Patrick Hunt
>            Assignee: Henry Robinson
>         Attachments: SimpleAddition.rtf
>
>
> Currently cluster membership is statically defined, adding/removing hosts to/from the server cluster dynamically needs to be supported.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (ZOOKEEPER-107) Allow dynamic changes to server cluster membership

Posted by "Raghu S (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12720334#action_12720334 ] 

Raghu S commented on ZOOKEEPER-107:
-----------------------------------

Ben, to be honest, I wasn't thinking batch addition/deletion. I was thinking we will allow only one node to join or leave the cluster at a time, in which case we won't end up in a split brain.

One thing I am still missing is, how do we plan to reconcile the divergence in conifguration info during leader election if we use ZAB? With ZAB, we go ahead and write to the log as soon as a PROPOSAL is sent. COMMIT is used only to notify the servers that the a majority have logged the update and the clients can start reading the new update. So I am not really seeing how this will help configuration change. Now in the example that you bring up, if D, E and F have logged the new view and all the nodes are brought up after a power cycle, a split brain could still occur, no? Should we allow only one node to be added/deleted at a time?

> Allow dynamic changes to server cluster membership
> --------------------------------------------------
>
>                 Key: ZOOKEEPER-107
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107
>             Project: Zookeeper
>          Issue Type: Improvement
>          Components: server
>            Reporter: Patrick Hunt
>         Attachments: SimpleAddition.rtf
>
>
> Currently cluster membership is statically defined, adding/removing hosts to/from the server cluster dynamically needs to be supported.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (ZOOKEEPER-107) Allow dynamic changes to server cluster membership

Posted by "Henry Robinson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12719856#action_12719856 ] 

Henry Robinson commented on ZOOKEEPER-107:
------------------------------------------

Great comments, thanks very much.

@Benjamin:

You're right that JOIN and LEAVE are special cases of GETVIEW / NEWVIEW. They exist only to allow easy expression of what I reckon will be a common pattern: "add my node to whatever the current cluster is", otherwise as you say you need a GETVIEW for sanity. 

There's an optimisation that would allow JOIN requests to always be committed even though the joiner might have failed - the leader can take it as read that the joiner implicitly acknowledges the JOIN proposal and can therefore push through the commit without requiring an explicit ack from it (you can show that, at least for majority quorums, the only way to have a quorum in an old view but not in the new under a JOIN is for the joiner to fail). However, since the ensemble would be failed immediately afterwards it wouldn't be much of a win.

I agree that simple is better; JOIN makes the client programming simpler at the expense of the server (and saves the client a round-trip), but functionally it's subsumed by the getview->newview pattern. 

I like erroring out when NEWVIEW won't result in a live ensemble, or at least having that as an option.

@Flavio:

I definitely agree that the persisted state should be in a znode (and should therefore be exposable through watches - important for externalising membership data for bootstrapping). However, since these operations require a very slightly different protocol to normal ZAB, I'm in favour of using the opcode to tell the leader to use a protocol variant rather than the identity of the znode - this means that the arguments to 'set' or whatever are kept opaque to the Leader. This is a matter of taste however - I'm not totally committed to JOIN et. al.

> Allow dynamic changes to server cluster membership
> --------------------------------------------------
>
>                 Key: ZOOKEEPER-107
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107
>             Project: Zookeeper
>          Issue Type: Improvement
>          Components: server
>            Reporter: Patrick Hunt
>         Attachments: SimpleAddition.rtf
>
>
> Currently cluster membership is statically defined, adding/removing hosts to/from the server cluster dynamically needs to be supported.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (ZOOKEEPER-107) Allow dynamic changes to server cluster membership

Posted by "Raghu S (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12721554#action_12721554 ] 

Raghu S commented on ZOOKEEPER-107:
-----------------------------------

That sounds great! I know this is a complex task and lot of work, can live with the kinks in the beginning. 

> Allow dynamic changes to server cluster membership
> --------------------------------------------------
>
>                 Key: ZOOKEEPER-107
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107
>             Project: Zookeeper
>          Issue Type: Improvement
>          Components: server
>            Reporter: Patrick Hunt
>         Attachments: SimpleAddition.rtf
>
>
> Currently cluster membership is statically defined, adding/removing hosts to/from the server cluster dynamically needs to be supported.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (ZOOKEEPER-107) Allow dynamic changes to server cluster membership

Posted by "Mahadev konar (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12706122#action_12706122 ] 

Mahadev konar commented on ZOOKEEPER-107:
-----------------------------------------

Henry,
 one thing I would like to point out is that please post a concrete proposal  (since this invloves the core internals of zookeeper) before you start working on this, so that their is agreement and no wasted effort... 

> Allow dynamic changes to server cluster membership
> --------------------------------------------------
>
>                 Key: ZOOKEEPER-107
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107
>             Project: Zookeeper
>          Issue Type: Improvement
>          Components: server
>            Reporter: Patrick Hunt
>
> Currently cluster membership is statically defined, adding/removing hosts to/from the server cluster dynamically needs to be supported.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (ZOOKEEPER-107) Allow dynamic changes to server cluster membership

Posted by "Benjamin Reed (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12719823#action_12719823 ] 

Benjamin Reed commented on ZOOKEEPER-107:
-----------------------------------------

Raghu, i think henry is correct that you must get an ack from quorums in both the old and new views before committing the change. otherwise you get split brain which could result in multiple leaders. henry, i think we are thinking along the same lines, but i'm a bit skeptical of JOIN and LEAVE. in some sense they are a bit of an optimization that can be implemented with GETVIEW and NEWVIEW. it would be nice to make the mechanism as simple as possible. it also seems like you would also require a GETVIEW to be done before doing a NEWVIEW, just for sanity. (require an expected version on NEWVIEW and not allow a -1.) i was thinking that we would just push NEWVIEW through Zab making sure we get acks from quorums in both the old and new views.

to help mitigate the case where proposing the NEWVIEW leads to a case where the system freezes up when the NEWVIEW proposal goes out and there isn't a quorum in the new view, the leader should probably make sure that it currently has quorum of followers in the new view before proposing the request. if it doesn't, it should error out the request. even with this we can still freeze up if we lose quorum in the new view after issuing the proposal, but that would happen anyway (as you point out), but it would prevent us from doing something that has no chance of working.

> Allow dynamic changes to server cluster membership
> --------------------------------------------------
>
>                 Key: ZOOKEEPER-107
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107
>             Project: Zookeeper
>          Issue Type: Improvement
>          Components: server
>            Reporter: Patrick Hunt
>         Attachments: SimpleAddition.rtf
>
>
> Currently cluster membership is statically defined, adding/removing hosts to/from the server cluster dynamically needs to be supported.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (ZOOKEEPER-107) Allow dynamic changes to server cluster membership

Posted by "Patrick Hunt (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12619119#action_12619119 ] 

Patrick Hunt commented on ZOOKEEPER-107:
----------------------------------------

In my comment "URI rather than a host/port list" I was specifically referring to the client's host/port list used to specify the servers to which the client should connect. Probably a good idea to use something like this on the servers as well.

Regarding the idea of join/leave a cluster, this sounds good. How does this mesh with the common case of starting up a set of 5 servers forming a new cluster? Specifically the idea of operations blocking (hiram's comment) until master is elected. Not sure I see how this works...


> Allow dynamic changes to server cluster membership
> --------------------------------------------------
>
>                 Key: ZOOKEEPER-107
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107
>             Project: Zookeeper
>          Issue Type: Improvement
>          Components: server
>            Reporter: Patrick Hunt
>
> Currently cluster membership is statically defined, adding/removing hosts to/from the server cluster dynamically needs to be supported.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (ZOOKEEPER-107) Allow dynamic changes to server cluster membership

Posted by "Flavio Paiva Junqueira (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12719831#action_12719831 ] 

Flavio Paiva Junqueira commented on ZOOKEEPER-107:
--------------------------------------------------

In general I like Henry's solution, and I think it works. However, I'm not entirely convinced that we need to augment the protocol with messages such as JOIN and LEAVE. I believe we can make it work by simply writing to a special znode and reading from it, which we need to do anyway if we want to use the mechanisms we have in place for durability. Of course, the leader has to follow changes to this znode and adapt its behavior accordingly (e.g., when sending proposals and commits). Followers, as far as I can tell, only need to register the changes to the znode as they make no use of such information, only for leader election

I also agree that there is an authentication problem as we don't want some arbitrary machine trying to join an ensemble.

If you're willing to share your proof sketches, I would be pleased to take a look at them. 

> Allow dynamic changes to server cluster membership
> --------------------------------------------------
>
>                 Key: ZOOKEEPER-107
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107
>             Project: Zookeeper
>          Issue Type: Improvement
>          Components: server
>            Reporter: Patrick Hunt
>         Attachments: SimpleAddition.rtf
>
>
> Currently cluster membership is statically defined, adding/removing hosts to/from the server cluster dynamically needs to be supported.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (ZOOKEEPER-107) Allow dynamic changes to server cluster membership

Posted by "Henry Robinson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12706219#action_12706219 ] 

Henry Robinson commented on ZOOKEEPER-107:
------------------------------------------

I agree with pretty much everything I've read here, (in particular, the importance of getting consensus!), but wanted to clarify my initial comment a bit.

Rather than choose between strategies 1 and 2 as outlined by Benjamin, I think there's a hybrid approach needed. 

If a node is a member of a quorate cluster, then the most up to date membership information should be available to it in a znode. I think this is the most elegant approach, and is trivially achieved by pushing join/leave requests through the atomic broadcast pipeline.

If a node is joining the cluster, it needs to be able to bootstrap the location of the cluster from somewhere. There therefore needs to be a externally available resource containing a list of machines in the cluster that is at least accurate for one machine (as a joining node will try all servers in that list in turn). When I say available at some URI, this is what I mean. Currently, this information is kept statically at a URI that addresses conf/zoo.cfg on the local filesystem. I suggest generalising that to a general URI. One nice property is that it then does not tie a cluster to a particular machine, as the URI provides a level of indirection.

It is then the cluster administrator's responsibility to keep this URI up-to-date (although of course this should be automated), possibly via a client that just pulls membership information from the cluster periodically. As I said earlier, it's only important for the contents of this list to have one node in common with the true membership of the cluster, so it's allowed to get a bit out of sync. We can certainly easily imagine ways that ZK can help here. Of course the URI must be highly available, but it also has to exist, otherwise we could have 'orphaned' clusters that are running on machines whose identity we don't necessarily know. The URI can be a front for almost any scheme we like - periodic heartbeating of live nodes is one. 

The format of this file can be anything at all - from a serialised snapshot to a list of ip:port pairs, as long as it contains enough information for a client to find the cluster. Personally I would prefer human readable, simple formats.

To talk about recovery for a moment: when a node recovers from a crash and rejoins the cluster, it can help the cluster elect a master if the cluster is current non-quorate. This is because it was originally part of the cluster, and therefore the protocol guarantees that a quorum of nodes including the recovering one will have seen all committed proposals (this is important to correctness).

If the node was not originally a member of the cluster, it must not help get a master elected as it cannot be part of a quorum. Similarly, a node cannot query the cluster to find out if it was originally a member because the quorum required to do so might not exist. Therefore every node that ever successfully joins a cluster must store this fact in its own persistent storage, as only it can know whether it is permitted to help run the election. 

Finally, the startup problem. Given a URI, nodes can bootstrap themselves onto a cluster simply by being told to start in startup mode. Alternatively, a single node can be distinguished (again, in the URI contents perhaps) which will start in single-node mode and process join requests one-by-one. 


> Allow dynamic changes to server cluster membership
> --------------------------------------------------
>
>                 Key: ZOOKEEPER-107
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107
>             Project: Zookeeper
>          Issue Type: Improvement
>          Components: server
>            Reporter: Patrick Hunt
>
> Currently cluster membership is statically defined, adding/removing hosts to/from the server cluster dynamically needs to be supported.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (ZOOKEEPER-107) Allow dynamic changes to server cluster membership

Posted by "Flavio Paiva Junqueira (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12722314#action_12722314 ] 

Flavio Paiva Junqueira commented on ZOOKEEPER-107:
--------------------------------------------------

That's a great catch, Henry, the one related to having any new (perhaps invalid) follower being able to submit requests. When you start a new follower not in the configuration, do you run it  as a regular replica and let it find its way or you explicitly tell the follower to connect to the leader?

I'm not sure if we should discuss detail of the observer here or in the other jira, but I'm wondering how an observer is able to find the leader to connect. The default leader election uses identifiers to connect and form quorums, so I'm not sure a server not in the configuration would be able to determine which replica is the leader. I think we can do it with leader election 0, though, if a leader has been elected and is running

Are you planning on having observers as a separate feature, as per ZK-368? It would be great to have it, since you are going through the effort of implementing it already.

As for the message to observers containing the transaction, the advantage of having a special message (e.g., INFORM) is that we cut down the number of messages to observers: INFORM is essentially a COMMIT containing the request. If we don't change the protocol, then we can just have the leader sending a PROPOSAL to everyone, including the observers. As observers will receive the COMMIT as well, we have higher message complexity. For now, I'm good either way. 


> Allow dynamic changes to server cluster membership
> --------------------------------------------------
>
>                 Key: ZOOKEEPER-107
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107
>             Project: Zookeeper
>          Issue Type: Improvement
>          Components: server
>            Reporter: Patrick Hunt
>            Assignee: Henry Robinson
>         Attachments: SimpleAddition.rtf
>
>
> Currently cluster membership is statically defined, adding/removing hosts to/from the server cluster dynamically needs to be supported.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (ZOOKEEPER-107) Allow dynamic changes to server cluster membership

Posted by "Raghu S (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12720411#action_12720411 ] 

Raghu S commented on ZOOKEEPER-107:
-----------------------------------

Ben, I still believe the split brain won't occur:

A. After (2), A and C have config verion X + 1, B and D are at X
B. After A dies, a leader election is not possible without C. During LE, B and D discover that C is at X + 1. This will force B and D to update their configuration to X + 1 and restart the election. This is what I refer to when I say "reconciling configuration divergence" in my write up. D now leaves the cluster since it just learnt that it was deleted.
C. A new quorum is formed with B and C.
D. When A comes back, config version of A B and C are the same. A will simply join the leader. If A were still at X, then it will first update it's configuration to X + 1 when it starts an election and then restart the election.

> Allow dynamic changes to server cluster membership
> --------------------------------------------------
>
>                 Key: ZOOKEEPER-107
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107
>             Project: Zookeeper
>          Issue Type: Improvement
>          Components: server
>            Reporter: Patrick Hunt
>         Attachments: SimpleAddition.rtf
>
>
> Currently cluster membership is statically defined, adding/removing hosts to/from the server cluster dynamically needs to be supported.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (ZOOKEEPER-107) Allow dynamic changes to server cluster membership

Posted by "Henry Robinson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12705192#action_12705192 ] 

Henry Robinson commented on ZOOKEEPER-107:
------------------------------------------

This is something I'd be willing to work on.

Just to sum up my current understanding of the requirements:

1. Must support off-cluster getPeers operation for a recovering peer to bootstrap itself (can cache in its own persistent storage, but that could potentially be out of date by recovery time). This is probably best realised with the URI idea as before.

2. Support for join and leave operation. With a quiescent cluster, join is probably as simple as a sync followed by a commit of the new peer's id to all followers (if nothing else, this ensures that if one of them should be elected the master, they know how big the quorum should be). Leaves are similar, without the sync obviously. If a peer leaves before the Leave( ) operation completes, it will look like a crash. 

3. If joining / leaving a cluster that doesn't have a currently elected master, block until one exists. If the cluster is currently failed due to f+1 failures, it might be necessary to timeout in order to prevent being permanently blocked if this is in the middle of a code path.

4. However, if joining / leaving a cluster that has never bootstrapped it's important to do something different so as to allow the cluster to achieve a quorum. One solution is for a node to check if its id is in the list of peers at the cluster URI which will tell it if it was ever a member of the cluster previously (or part of the initial membership) and then participate in master elections. This places a requirement on the peer list to be kept reasonably accurate (but this could only affect liveness, not safety, I think).

Please chime in with comments / stuff that I've missed / bugs, otherwise I'll work on fleshing this out.



> Allow dynamic changes to server cluster membership
> --------------------------------------------------
>
>                 Key: ZOOKEEPER-107
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107
>             Project: Zookeeper
>          Issue Type: Improvement
>          Components: server
>            Reporter: Patrick Hunt
>
> Currently cluster membership is statically defined, adding/removing hosts to/from the server cluster dynamically needs to be supported.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (ZOOKEEPER-107) Allow dynamic changes to server cluster membership

Posted by "Henry Robinson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12730414#action_12730414 ] 

Henry Robinson commented on ZOOKEEPER-107:
------------------------------------------

Raghu:

Your solution is essentially what will happen. F and G will contact E while they are trying to elect a leader. During this process they can all exchange the most recent view that they saw so that E realises the current view. If EFG form a quorum in any view then we can see that either a) it is the latest view or b) at least one of them will know about a later view. Therefore there's also no concern about resurrecting old views.





> Allow dynamic changes to server cluster membership
> --------------------------------------------------
>
>                 Key: ZOOKEEPER-107
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107
>             Project: Zookeeper
>          Issue Type: Improvement
>          Components: server
>            Reporter: Patrick Hunt
>            Assignee: Henry Robinson
>         Attachments: SimpleAddition.rtf
>
>
> Currently cluster membership is statically defined, adding/removing hosts to/from the server cluster dynamically needs to be supported.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (ZOOKEEPER-107) Allow dynamic changes to server cluster membership

Posted by "Vishal K (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12863359#action_12863359 ] 

Vishal K commented on ZOOKEEPER-107:
------------------------------------

Hi Henry,

We are using ZK for one the projects at VMware. We are very much interested in having dynamic membership managment. I went through the dev mailing list above . I would like to contribute and develop this feature. It sounds like a fun project.

Can you please provide an update regarding how far we are with this and any documentation that you may have? I will start off a separte discussion thread regarding this on the dev mailing list instead of having it over the jira.

Thanks.

Regards,
-Vishal

> Allow dynamic changes to server cluster membership
> --------------------------------------------------
>
>                 Key: ZOOKEEPER-107
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107
>             Project: Zookeeper
>          Issue Type: Improvement
>          Components: server
>            Reporter: Patrick Hunt
>            Assignee: Henry Robinson
>         Attachments: SimpleAddition.rtf
>
>
> Currently cluster membership is statically defined, adding/removing hosts to/from the server cluster dynamically needs to be supported.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (ZOOKEEPER-107) Allow dynamic changes to server cluster membership

Posted by "Benjamin Reed (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12706320#action_12706320 ] 

Benjamin Reed commented on ZOOKEEPER-107:
-----------------------------------------

the information needed for bootstrapping is the same as the information needed for a normal zookeeper client, so it could either use the standard string that is a list of host:port pairs, or it could use the scheme proposed in ZOOKEEPER-390. with that URL it could fetch /.zookeeper/ensemble and grab the configuration information that it needs. conf/zoo.cfg isn't really a good URI for this purpose since is doesn't really have the needed client ports. plus there is information in zoo.cfg that is particular to a given server. for example, the data and log directories may be different on all the machines. the client port should also probably stay in the zoo.cfg. the server lists and probably the timing variables should probably be stored in a znode and maintained with the atomic broadcast.

recovery is a bit more than you mention, but at the same time simpler. first off, to change quorum configuration you must commit the change in both the old quorum configuration and in the new quorum configuration. for example, if you have the configuration A, B, C and you are changing to A, B, C, D, E you must be able to get quorum in both the old and new configuration for the change to work. if only A and B are up or A, D, and E are up you cannot commit the change. this means that the leader should check the new configuration carefully before proposing it, because we always roll the proposals forward, we never rollback.

so really a zookeeper server doesn't know whether he is able to participate or not, the election will sort it out. a simple example is an ensemble A, B, C, D, E. E goes down. the last zxid it saw was <57,3>. while it is down the quorum configuration gets changed to A, B, C by <57,52>. lets say there is a leadership change and at <58,6> the power goes out and comes back on. E now tries to vote (it thinks it is permitted to participate), but it won't win any election since its zxid is too low. A, B, and C will ignore E's votes anyway because they know that E has been removed from the ensemble.

> Allow dynamic changes to server cluster membership
> --------------------------------------------------
>
>                 Key: ZOOKEEPER-107
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107
>             Project: Zookeeper
>          Issue Type: Improvement
>          Components: server
>            Reporter: Patrick Hunt
>
> Currently cluster membership is statically defined, adding/removing hosts to/from the server cluster dynamically needs to be supported.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (ZOOKEEPER-107) Allow dynamic changes to server cluster membership

Posted by "Raghu S (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12721608#action_12721608 ] 

Raghu S commented on ZOOKEEPER-107:
-----------------------------------

Henry, just a question out of curiosity - how do you think the cluster will be created on day one? Would a 2/3 node cluster be built initially by powering on the servers using pre-populated configuration and then nodes are added/deleted dynamically? Or a cluster will be built incrementally using dynamic configuration change, starting with a single node?

> Allow dynamic changes to server cluster membership
> --------------------------------------------------
>
>                 Key: ZOOKEEPER-107
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107
>             Project: Zookeeper
>          Issue Type: Improvement
>          Components: server
>            Reporter: Patrick Hunt
>         Attachments: SimpleAddition.rtf
>
>
> Currently cluster membership is statically defined, adding/removing hosts to/from the server cluster dynamically needs to be supported.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (ZOOKEEPER-107) Allow dynamic changes to server cluster membership

Posted by "Raghu S (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12721566#action_12721566 ] 

Raghu S commented on ZOOKEEPER-107:
-----------------------------------

Henry, the JIRA is unassigned. You might want to assign it to yourself.

> Allow dynamic changes to server cluster membership
> --------------------------------------------------
>
>                 Key: ZOOKEEPER-107
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107
>             Project: Zookeeper
>          Issue Type: Improvement
>          Components: server
>            Reporter: Patrick Hunt
>         Attachments: SimpleAddition.rtf
>
>
> Currently cluster membership is statically defined, adding/removing hosts to/from the server cluster dynamically needs to be supported.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (ZOOKEEPER-107) Allow dynamic changes to server cluster membership

Posted by "Quinton Hoole (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12851420#action_12851420 ] 

Quinton Hoole commented on ZOOKEEPER-107:
-----------------------------------------

Any progress on this issue?  It seems to have gone very quiet.  We have this precise requirement, and will have to solve it one way or another in the coming months.

> Allow dynamic changes to server cluster membership
> --------------------------------------------------
>
>                 Key: ZOOKEEPER-107
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107
>             Project: Zookeeper
>          Issue Type: Improvement
>          Components: server
>            Reporter: Patrick Hunt
>            Assignee: Henry Robinson
>         Attachments: SimpleAddition.rtf
>
>
> Currently cluster membership is statically defined, adding/removing hosts to/from the server cluster dynamically needs to be supported.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (ZOOKEEPER-107) Allow dynamic changes to server cluster membership

Posted by "Mahadev konar (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mahadev konar updated ZOOKEEPER-107:
------------------------------------

    Fix Version/s: 3.2.0
         Assignee: Patrick Hunt

> Allow dynamic changes to server cluster membership
> --------------------------------------------------
>
>                 Key: ZOOKEEPER-107
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107
>             Project: Zookeeper
>          Issue Type: Improvement
>          Components: server
>            Reporter: Patrick Hunt
>            Assignee: Patrick Hunt
>             Fix For: 3.2.0
>
>         Attachments: SimpleAddition.rtf
>
>
> Currently cluster membership is statically defined, adding/removing hosts to/from the server cluster dynamically needs to be supported.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (ZOOKEEPER-107) Allow dynamic changes to server cluster membership

Posted by "Benjamin Reed (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12619035#action_12619035 ] 

Benjamin Reed commented on ZOOKEEPER-107:
-----------------------------------------

+1 I like the idea. You can currently use DNS for this functionality: make zookeeper.acme.com resolve to 5 different IP addresses and then specify new ZooKeeper("zookeeper.acme.com:3233", 1000, this), but DNS is hard to modify. A replicate webserver would be much easier to update.

> Allow dynamic changes to server cluster membership
> --------------------------------------------------
>
>                 Key: ZOOKEEPER-107
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107
>             Project: Zookeeper
>          Issue Type: Improvement
>          Components: server
>            Reporter: Patrick Hunt
>
> Currently cluster membership is statically defined, adding/removing hosts to/from the server cluster dynamically needs to be supported.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (ZOOKEEPER-107) Allow dynamic changes to server cluster membership

Posted by "Flavio Paiva Junqueira (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12720174#action_12720174 ] 

Flavio Paiva Junqueira commented on ZOOKEEPER-107:
--------------------------------------------------

I think having the messages explicitly in the protocol helps to convey the implemented abstraction, making it easier to read and understand. However, it is bad for backward compatibility, although it might be the case that we silently ignore unknown messages. 

> Allow dynamic changes to server cluster membership
> --------------------------------------------------
>
>                 Key: ZOOKEEPER-107
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107
>             Project: Zookeeper
>          Issue Type: Improvement
>          Components: server
>            Reporter: Patrick Hunt
>         Attachments: SimpleAddition.rtf
>
>
> Currently cluster membership is statically defined, adding/removing hosts to/from the server cluster dynamically needs to be supported.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (ZOOKEEPER-107) Allow dynamic changes to server cluster membership

Posted by "Raghu S (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12731776#action_12731776 ] 

Raghu S commented on ZOOKEEPER-107:
-----------------------------------

@henry,

Sorry if this sounds like a repeat, thought I will summarize the error handling during view change. Could you comment if this makes sense?

----------

1. Configuration change succeeds if the change is successfully committed in both the old view and the new view. An observer is promoted to a follower only after it receives a COMMIT for the new view.

2. Each peer could have two views of the cluster -- the last committed view and the last proposed view (which is created after a VIEWCHANGE proposal is received). The latter can be NULL if there is no view change attempt in progress.
    2.A. Each peer will always attempt an election with the last committed view. Proposed views will be converted to committed views (or deleted) post leader election.
    2.B. The proposal record of a peer contains (in addition to last logged ZXID and server ID) the last committed view of the peer

3. During election, if the last committed view of the peer with the smaller ZXID (P(ZXLOW)) is different from the last committed view of the peer with the higher ZXID (P(ZXHIGH), then P(ZXLOW) adapts P(ZXHIGH)'s last committed view and broadcasts the adapted view to all other peers.
	3.A. Two nodes with the same ZXID should have the same committed views
	3.B. If the last committed views of P(ZXLOW) and P(ZXHIGH) are the same, but P(ZXHIGH) has a proposed new view (not committed yet though), that view will not be considered by both the peers during election. Similarly, if the N(ZXLOW) has a proposed view, that will not be considered either.
	3.C. If P(ZXLOW) adapts P(ZXHIGH)'s last committed view and that view doesn't include P(ZXLOW), P(ZXLOW) drops out of election (should it self destruct??)

4. Once a leader is elected, it will sync up the logs of the followers that are lagging behind just like it's done today:
	- If there is a follower who's last committed view is different from the leader's, log synchronization will make sure follower's last committed view gets updated to be in sync with the leader's. Follower doesn't do anything when its last committed view changes (the new view MUST have the follower since 3.C prevents a follower that is not in the leading candidate's committed view from successfully completing an election)
	- If there is an observer who upon log synchronization learns that the committed view includes the observer, the observer will promote itself to a follower
	- If a follower with a proposed view joins an already established leader who doesn't know about that proposed view, the follower's proposed view will be erased when the leader synchronizes the followers log
	- If the leader has a proposed new view in its log, the leader will send a COMMIT for the new view after majority peers in the old view and the new view have synced their log to the leader's log 
		4.A. The view change COMMIT doesn't mean much for the followers that are not impacted by the view change
		4.B. The observer that gets view change COMMIT will promote itself to a follower if the new view includes the observer
		4.C. The follower that gets the view change will drop out of the cluster if the new view doesn't include the follower 
		4.D. The leader will drop out of the cluster once COMMIT is delivered locally if the new view doesn't include the leader. This will result in a new election.
		4.E. The leader will adjust the quorum size as per the new view otherwise.



> Allow dynamic changes to server cluster membership
> --------------------------------------------------
>
>                 Key: ZOOKEEPER-107
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107
>             Project: Zookeeper
>          Issue Type: Improvement
>          Components: server
>            Reporter: Patrick Hunt
>            Assignee: Henry Robinson
>         Attachments: SimpleAddition.rtf
>
>
> Currently cluster membership is statically defined, adding/removing hosts to/from the server cluster dynamically needs to be supported.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (ZOOKEEPER-107) Allow dynamic changes to server cluster membership

Posted by "Benjamin Reed (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12720323#action_12720323 ] 

Benjamin Reed commented on ZOOKEEPER-107:
-----------------------------------------

just a caveat to my last comment. for point 1) we actually do need to touch the protocol code a bit to ensure that the setData that changes the view commits in both the old and new views.

> Allow dynamic changes to server cluster membership
> --------------------------------------------------
>
>                 Key: ZOOKEEPER-107
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107
>             Project: Zookeeper
>          Issue Type: Improvement
>          Components: server
>            Reporter: Patrick Hunt
>         Attachments: SimpleAddition.rtf
>
>
> Currently cluster membership is statically defined, adding/removing hosts to/from the server cluster dynamically needs to be supported.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (ZOOKEEPER-107) Allow dynamic changes to server cluster membership

Posted by "Raghu S (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12706697#action_12706697 ] 

Raghu S commented on ZOOKEEPER-107:
-----------------------------------

I think there are some corner cases that may make the leader election impossible during a node addition. Say the current config is A,B,C and the new config is A,B,C,D. When the leader is trying to commit the new configuration, the power goes out and comes back on when only A and B have logged the new configuration. Peer count in <A,B,C,D> = <4,4,3,3> now. An election is not possible if C is down because A and B think the majority is 3 peers and D can't participate in the election since it hasn't joined the cluster yet. 

It sounds like some out of band communication between an existing peer and a new peer is needed to make this thing work. If a peer restarts or notices quorum loss and if the last logged update is a node addition, the peer should try to contact the newly added server so that it can push it's log to the new peer (if the new peer doesn't already have an up to date log) and ask the new peer to restart. Until A or B do that in the above case, an election may not be possible.

> Allow dynamic changes to server cluster membership
> --------------------------------------------------
>
>                 Key: ZOOKEEPER-107
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107
>             Project: Zookeeper
>          Issue Type: Improvement
>          Components: server
>            Reporter: Patrick Hunt
>
> Currently cluster membership is statically defined, adding/removing hosts to/from the server cluster dynamically needs to be supported.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (ZOOKEEPER-107) Allow dynamic changes to server cluster membership

Posted by "Flavio Paiva Junqueira (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12874234#action_12874234 ] 

Flavio Paiva Junqueira commented on ZOOKEEPER-107:
--------------------------------------------------

I have started a wiki page to discuss the design of this feature: http://wiki.apache.org/hadoop/ZooKeeper/ClusterMembership

It is not supposed to replace this jira, but instead to have a more organized way of working on the design of this feature. Please consider contributing. 

> Allow dynamic changes to server cluster membership
> --------------------------------------------------
>
>                 Key: ZOOKEEPER-107
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107
>             Project: Zookeeper
>          Issue Type: Improvement
>          Components: server
>            Reporter: Patrick Hunt
>            Assignee: Henry Robinson
>         Attachments: SimpleAddition.rtf
>
>
> Currently cluster membership is statically defined, adding/removing hosts to/from the server cluster dynamically needs to be supported.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (ZOOKEEPER-107) Allow dynamic changes to server cluster membership

Posted by "Benjamin Reed (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12619109#action_12619109 ] 

Benjamin Reed commented on ZOOKEEPER-107:
-----------------------------------------

I think there are two issues here: 1) adding/removing servers to a ZooKeeper cluster and 2) letting clients know about the change. We should probably separate them. I like the URL idea for dealing with 1) (especially when used in conjunction with the other idea in this Jira of defining a URL scheme for ZooKeeper). For 2) I agree with Hiram that it should be stored persistently at each replica and changed via the replication protocol.

> Allow dynamic changes to server cluster membership
> --------------------------------------------------
>
>                 Key: ZOOKEEPER-107
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107
>             Project: Zookeeper
>          Issue Type: Improvement
>          Components: server
>            Reporter: Patrick Hunt
>
> Currently cluster membership is statically defined, adding/removing hosts to/from the server cluster dynamically needs to be supported.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (ZOOKEEPER-107) Allow dynamic changes to server cluster membership

Posted by "Mahadev konar (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mahadev konar updated ZOOKEEPER-107:
------------------------------------

    Fix Version/s:     (was: 3.2.0)
         Assignee:     (was: Patrick Hunt)

updated the wrong jira! :) 

> Allow dynamic changes to server cluster membership
> --------------------------------------------------
>
>                 Key: ZOOKEEPER-107
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107
>             Project: Zookeeper
>          Issue Type: Improvement
>          Components: server
>            Reporter: Patrick Hunt
>         Attachments: SimpleAddition.rtf
>
>
> Currently cluster membership is statically defined, adding/removing hosts to/from the server cluster dynamically needs to be supported.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (ZOOKEEPER-107) Allow dynamic changes to server cluster membership

Posted by "Patrick Hunt (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12731056#action_12731056 ] 

Patrick Hunt commented on ZOOKEEPER-107:
----------------------------------------

I've only been following this a bit, and I see bits/pieces in the comments but not sure I follow it all -- some questions around the plan wrt manageability:

1) adding removing servers, server itself needs to be configured, any changes needed to config on existing ensemble? I see Raghu has similar comment on this
2) JMX - what's the plan? what additional properties/actions will be supported?
3) 4letter words - same issues as jmx
4) debug-ability - ensure adequate logging (log4j) on ensemble

5) security - will an ensemble allow any server to connect to it? today we have ensemble participants hardwired into the config of each of the servers right?

testing and b/w compat -- are we ensuring b/w compat btw this version and previous versions? (I'm probably going to look at beefing up unit & systest next, esp around b/w compat, so would be good to have a better idea where this is headed). IMO this patch must include unit as well as systest before it is committed.

documentation will be needed as well.


Perhaps a wiki "proposal" page should be created that will capture the "current proposal" for easy review of this feature? This JIRA can capture ongoing discussion, with agreed upon results capture in the wiki design/functional document. I know it would help me alot.


> Allow dynamic changes to server cluster membership
> --------------------------------------------------
>
>                 Key: ZOOKEEPER-107
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107
>             Project: Zookeeper
>          Issue Type: Improvement
>          Components: server
>            Reporter: Patrick Hunt
>            Assignee: Henry Robinson
>         Attachments: SimpleAddition.rtf
>
>
> Currently cluster membership is statically defined, adding/removing hosts to/from the server cluster dynamically needs to be supported.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (ZOOKEEPER-107) Allow dynamic changes to server cluster membership

Posted by "Mahadev konar (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12619118#action_12619118 ] 

Mahadev konar commented on ZOOKEEPER-107:
-----------------------------------------

+1 for using URI's on the client side to get a list of zookeeper servers . We can always update the zookeeper client periodically by fetching from the URI .... 

> Allow dynamic changes to server cluster membership
> --------------------------------------------------
>
>                 Key: ZOOKEEPER-107
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107
>             Project: Zookeeper
>          Issue Type: Improvement
>          Components: server
>            Reporter: Patrick Hunt
>
> Currently cluster membership is statically defined, adding/removing hosts to/from the server cluster dynamically needs to be supported.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (ZOOKEEPER-107) Allow dynamic changes to server cluster membership

Posted by "Raghu S (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Raghu S updated ZOOKEEPER-107:
------------------------------

    Attachment: SimpleAddition.rtf

Folks, I have attached a high level write up on how we can implement dynamic configuration changes in a ZK ensemble. Please comment on this and let me know if you think this approach makes sense. Since I just came up with this, there could be corner cases that I may need to handle.

Please note that this write-up doesn't focus on how we can make the configuration details available to the clients using a URI. I believe that can be dealt separately.

> Allow dynamic changes to server cluster membership
> --------------------------------------------------
>
>                 Key: ZOOKEEPER-107
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107
>             Project: Zookeeper
>          Issue Type: Improvement
>          Components: server
>            Reporter: Patrick Hunt
>         Attachments: SimpleAddition.rtf
>
>
> Currently cluster membership is statically defined, adding/removing hosts to/from the server cluster dynamically needs to be supported.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (ZOOKEEPER-107) Allow dynamic changes to server cluster membership

Posted by "Henry Robinson (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Henry Robinson reassigned ZOOKEEPER-107:
----------------------------------------

    Assignee: Henry Robinson

> Allow dynamic changes to server cluster membership
> --------------------------------------------------
>
>                 Key: ZOOKEEPER-107
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107
>             Project: Zookeeper
>          Issue Type: Improvement
>          Components: server
>            Reporter: Patrick Hunt
>            Assignee: Henry Robinson
>         Attachments: SimpleAddition.rtf
>
>
> Currently cluster membership is statically defined, adding/removing hosts to/from the server cluster dynamically needs to be supported.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (ZOOKEEPER-107) Allow dynamic changes to server cluster membership

Posted by "Henry Robinson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12728429#action_12728429 ] 

Henry Robinson commented on ZOOKEEPER-107:
------------------------------------------

That's definitely one way to do it. The other side to that argument is to keep the message complexity down, especially if we can envisage use cases with lots of Observers. A connection to a remote Observer might be more likely to violate the FIFO requirement of ZK connections; having a single-message protocol makes it easier to deal with this case (not a correctness issue of Observers, just annoying if PROPOSALs arrive after COMMITs for some reason). I think that's a marginal issue though. My preference is for INFORM messages as this completely separates Observer logic from Follower logic and doesn't add much complexity to the code. 

The Observer also has to take care not to participate in leader elections. I think Observers also need to announce themselves as such to the Leader, to enable the case where a Follower wishes to connect as an Observer temporarily (otherwise the Leader will think the Observer to be a Follower and use it as part of a quorum). Also if the leader can distinguish between followers and observers then it can treat both differently (e.g. through batching multiple INFORMs or allowing observers to lag by prioritising follower traffic). 

Keeping Observers as special-case Followers would simplify the code for the observers patch (I've got a new version nearly ready to submit, just fixing some tests). However, it would mean that Observers are harder to customise - for example, there's no persistence requirement for an Observer and so some of the RequestProcessors can be optionally removed or replaced by something that only asynchronously writes to disk. Keeping them lightweight has been a goal. My feeling was that I was introducing too many 'if (amObserver()) {...}' branches to an already fairly hard to follow bit of code (in particular Follower.followLeader). Breaking the functionality into two separate classes seems to have made things cleaner.

Regarding Observers being able to issue proposals; I don't have a problem with that, should be reasonably easy to add. 

> Allow dynamic changes to server cluster membership
> --------------------------------------------------
>
>                 Key: ZOOKEEPER-107
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107
>             Project: Zookeeper
>          Issue Type: Improvement
>          Components: server
>            Reporter: Patrick Hunt
>            Assignee: Henry Robinson
>         Attachments: SimpleAddition.rtf
>
>
> Currently cluster membership is statically defined, adding/removing hosts to/from the server cluster dynamically needs to be supported.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (ZOOKEEPER-107) Allow dynamic changes to server cluster membership

Posted by "Raghu S (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12721543#action_12721543 ] 

Raghu S commented on ZOOKEEPER-107:
-----------------------------------

Right, that's a problem, that rules out using my proposal.

Now some logistics related:

@henry,

We need this feature pretty desperately and it will be great if you tell me how we want to go about designing and implementing this. IOW, would you like to own the design and implementation of this? If yes, will be really great if you could let me know the timeframe. Please don't get me wrong, just trying to figure out the timeframe here. If you can't get to this in the short term, I can chip in. Thanks.




> Allow dynamic changes to server cluster membership
> --------------------------------------------------
>
>                 Key: ZOOKEEPER-107
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107
>             Project: Zookeeper
>          Issue Type: Improvement
>          Components: server
>            Reporter: Patrick Hunt
>         Attachments: SimpleAddition.rtf
>
>
> Currently cluster membership is statically defined, adding/removing hosts to/from the server cluster dynamically needs to be supported.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (ZOOKEEPER-107) Allow dynamic changes to server cluster membership

Posted by "Henry Robinson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12721547#action_12721547 ] 

Henry Robinson commented on ZOOKEEPER-107:
------------------------------------------

Yes, I'd like to take ownership of implementing this. 

I'd like to have a patch available within one to two weeks. There are some implementation issues to work through which might take time (for example, how do we manage the connections between joining followers and the current leader - who connects to whom?). I see the initial version of the patch simply as adding functionality to the core protocol. Adding any extensions to the client APIs would come in a second revision. Ironing out the kinks in the first patch will also doubtless take some time.

Does that sound ok? You can go with an unstable implementation as soon as the patch is released.

> Allow dynamic changes to server cluster membership
> --------------------------------------------------
>
>                 Key: ZOOKEEPER-107
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107
>             Project: Zookeeper
>          Issue Type: Improvement
>          Components: server
>            Reporter: Patrick Hunt
>         Attachments: SimpleAddition.rtf
>
>
> Currently cluster membership is statically defined, adding/removing hosts to/from the server cluster dynamically needs to be supported.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (ZOOKEEPER-107) Allow dynamic changes to server cluster membership

Posted by "Flavio Paiva Junqueira (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12721697#action_12721697 ] 

Flavio Paiva Junqueira commented on ZOOKEEPER-107:
--------------------------------------------------

I suggest that the system starts as a standalone instance and the other replicas join by contacting the standalone replica using the new dynamic membership mechanism. This way we avoid pre-loading a configuration. An important observation is that there will be a transition from standalone to ensemble, which I think won't be difficult to deal with in the code, but we have to make sure that this observation is correct. 

> Allow dynamic changes to server cluster membership
> --------------------------------------------------
>
>                 Key: ZOOKEEPER-107
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107
>             Project: Zookeeper
>          Issue Type: Improvement
>          Components: server
>            Reporter: Patrick Hunt
>         Attachments: SimpleAddition.rtf
>
>
> Currently cluster membership is statically defined, adding/removing hosts to/from the server cluster dynamically needs to be supported.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (ZOOKEEPER-107) Allow dynamic changes to server cluster membership

Posted by "Flavio Paiva Junqueira (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12722153#action_12722153 ] 

Flavio Paiva Junqueira commented on ZOOKEEPER-107:
--------------------------------------------------

+1, I think it is a good idea to use observers (ZOOKEEPER-368). This way we make sure that once the new configuration is committed the new active member is in sync with the leader. 

I have a slightly different idea of how to make it work, though. I was thinking that once the observer finsihes synchronizing with the leader, it can simply submit a setData. This way we have no special code path for this operation. Only when finalizing the setData operation we have to update all appropriate data structures.

> Allow dynamic changes to server cluster membership
> --------------------------------------------------
>
>                 Key: ZOOKEEPER-107
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107
>             Project: Zookeeper
>          Issue Type: Improvement
>          Components: server
>            Reporter: Patrick Hunt
>            Assignee: Henry Robinson
>         Attachments: SimpleAddition.rtf
>
>
> Currently cluster membership is statically defined, adding/removing hosts to/from the server cluster dynamically needs to be supported.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (ZOOKEEPER-107) Allow dynamic changes to server cluster membership

Posted by "Benjamin Reed (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12720380#action_12720380 ] 

Benjamin Reed commented on ZOOKEEPER-107:
-----------------------------------------

so if you do one at a time without using Zab, without working through the details
1) start with A, B, C, D
2) A is the leader and proposes LEAVE D and fails where only A and C get it.
3) B is the leader and proposes LEAVE C and fails where only B and D get it because of a complete power outage.
4) everything comes back up
5) A is elected leader by C
6) B is elected leader by D

if we use ZAB split brain will not occur because we do not use the configuration until it has been committed. since it has been accepted by both the old and new quorums, we will eventually converge on the new configuration. (that is my conjecture, still needs to be proven)

> Allow dynamic changes to server cluster membership
> --------------------------------------------------
>
>                 Key: ZOOKEEPER-107
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107
>             Project: Zookeeper
>          Issue Type: Improvement
>          Components: server
>            Reporter: Patrick Hunt
>         Attachments: SimpleAddition.rtf
>
>
> Currently cluster membership is statically defined, adding/removing hosts to/from the server cluster dynamically needs to be supported.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (ZOOKEEPER-107) Allow dynamic changes to server cluster membership

Posted by "Raghu S (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12705822#action_12705822 ] 

Raghu S commented on ZOOKEEPER-107:
-----------------------------------

Using a URL for obtaining a list of servers may not be acceptable in all environments. The URL will have to highly available otherwise the clients won't connect after a restart. It applies to ZK peers as well, a peer won't be able to restart if the URL is down at the time of peer restart. So I believe (1) is a better choice here -- the cluster needs to be self-contained.

IIUC, using a ZK node to expose the cluster config would mean that the cluster can't be created unless the cluster config is pre-populated? We need to first create a log file that contains a config node that has the initial cluster members, copy them to all the nodes and power them on to create the cluster? Once the cluster is formed, new nodes can be admitted to the cluster by having a quorum commit the new server name in the cluster config. 

Another alternative would be to have a single node come up first (with the log file pre-populated to contain a single server in the cluster config node) and then have all other servers join one at a time? This sounds a bit weird to me in that the first node will have to form a single node cluster in the beginning, which is an oxymoron :)



> Allow dynamic changes to server cluster membership
> --------------------------------------------------
>
>                 Key: ZOOKEEPER-107
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107
>             Project: Zookeeper
>          Issue Type: Improvement
>          Components: server
>            Reporter: Patrick Hunt
>
> Currently cluster membership is statically defined, adding/removing hosts to/from the server cluster dynamically needs to be supported.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (ZOOKEEPER-107) Allow dynamic changes to server cluster membership

Posted by "Benjamin Reed (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12722063#action_12722063 ] 

Benjamin Reed commented on ZOOKEEPER-107:
-----------------------------------------

i think if we use the notion of observers it helps: an observer can sync with a leader, but it doesn't get to vote. i think this makes it easy because the leader can then determine that it can commit with both the active followers and active observers if needed: for example start with A, B, C and move to A, B, D, E, F. if A and C are active followers and E and F are observers then the leader will propose the new configuration.

> Allow dynamic changes to server cluster membership
> --------------------------------------------------
>
>                 Key: ZOOKEEPER-107
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107
>             Project: Zookeeper
>          Issue Type: Improvement
>          Components: server
>            Reporter: Patrick Hunt
>            Assignee: Henry Robinson
>         Attachments: SimpleAddition.rtf
>
>
> Currently cluster membership is statically defined, adding/removing hosts to/from the server cluster dynamically needs to be supported.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (ZOOKEEPER-107) Allow dynamic changes to server cluster membership

Posted by "Jakob Homan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12618885#action_12618885 ] 

jghoman edited comment on ZOOKEEPER-107 at 7/31/08 3:30 PM:
----------------------------------------------------------------

I thought I had along this line was to create an additional constructor that takes a URL rather than string for the host.  The constructor could then access that URL and get the list of servers from there.  So, assuming the URL pointed to an http page, the page returned would just be "hostA:port,hostB:port,etc" and the client could proceed with that information.  Or, the URL could point to a local file if the server membership wasn't expected to change often.  This would eliminate the need for the clients to have any idea ahead of time of where to get the hosts.  Particularly if the information were served via http, this would move server location to being a DNS/virtualhost problem - the DNS for the server-info location could be changed if the server providing the info died.  This would go a ways toward adding a restful interface to the configuration of the cluster.

So, the client would instantiate with 
zk = new Zookeeper(new URL("http://zkhostinfo:7552"), 1000, this);, 
receive the list of hosts and work from there.  

This would address (though not solve) the issue of adding servers, as new servers could be added to the list returned to new clients.

As a corollary, the zk servers could of course serve the list of hosts as an optional part of their operation, or this function could be provided by another application, directory-style.  This would allow clients to connect to different clusters on start-up, if the directory could identify and differentiate clients by ip addr or such, and direct them to the appropriate zk group.

If this sounds (reasonable && interesting), I can open a JIRA and work on a patch to add the new constructor and client functionality.

      was (Author: jghoman):
    I thought I had along this line was to create an additional constructor that takes a URL rather than string for the host.  The constructor could then access that URL and get the list of servers from there.  So, assuming the URL pointed to an http page, the page returned would just be "hostA:port,hostB:port,etc" and the client could proceed with that information.  Or, the URL could point to a local file if the server membership wasn't expected to change often.  This would eliminate the need for the clients to have any idea ahead of time of where to get the hosts.  Particularly if the information were served via http, this would move server location to being a DNS/virtualhost problem - the DNS for the server-info location could be changed if the server providing the info died.  This would go a ways toward adding a restful interface to the configuration of the cluster.

So, the client would instantiate with zk  new Zookeeper(new URL("http://zkhostinfo:7552"), 1000, this);, receive the list of hosts and work from there.  

This would address (though not solve) the issue of adding servers, as new servers could be added to the list returned to new clients.

As a corollary, the zk servers could of course serve the list of hosts as an optional part of their operation, or this function could be provided by another application, directory-style.  This would allow clients to connect to different clusters on start-up, if the directory could identify and differentiate clients by ip addr or such, and direct them to the appropriate zk group.

If this sounds (reasonable && interesting), I can open a JIRA and work on a patch to add the new constructor and client functionality.
  
> Allow dynamic changes to server cluster membership
> --------------------------------------------------
>
>                 Key: ZOOKEEPER-107
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107
>             Project: Zookeeper
>          Issue Type: Improvement
>          Components: server
>            Reporter: Patrick Hunt
>
> Currently cluster membership is statically defined, adding/removing hosts to/from the server cluster dynamically needs to be supported.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (ZOOKEEPER-107) Allow dynamic changes to server cluster membership

Posted by "Henry Robinson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12722245#action_12722245 ] 

Henry Robinson edited comment on ZOOKEEPER-107 at 6/20/09 12:09 PM:
--------------------------------------------------------------------

As it turns out, I've pretty much implemented Observers in all but name already - they go through the same connection logic as normal followers, and therefore sync, but are disbarred from sending Leader.REQUEST packets to the leader. Similarly, when a leader is sending a proposal packet it only gets sent to those followers which are in the current view. Since the logic is very similar, and we will be able to distinguish observers from followers by whether they are members of the current view, I haven't duplicated code into Observer* classes.

I added this when finding that any new follower can join an existing ensemble and issue proposals to it, even if the static configuration of the ensemble does not contain it. This seemed to deadlock the ensemble pretty quickly :)

Edit: of course, this means that Observers can't actually see the payload of a transaction, as per the note on ZK-368. Either the leader sends special packets (INFORM, perhaps) to Observers containing the transaction payload, or the Observers must know not to participate in voting. That said, the Leader will ignore the votes of Observers, but we want to cut down on traffic. 



      was (Author: henryr):
    As it turns out, I've pretty much implemented Observers in all but name already - they go through the same connection logic as normal followers, and therefore sync, but are disbarred from sending Leader.REQUEST packets to the leader. Similarly, when a leader is sending a proposal packet it only gets sent to those followers which are in the current view. Since the logic is very similar, and we will be able to distinguish observers from followers by whether they are members of the current view, I haven't duplicated code into Observer* classes.

I added this when finding that any new follower can join an existing ensemble and issue proposals to it, even if the static configuration of the ensemble does not contain it. This seemed to deadlock the ensemble pretty quickly :)


  
> Allow dynamic changes to server cluster membership
> --------------------------------------------------
>
>                 Key: ZOOKEEPER-107
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107
>             Project: Zookeeper
>          Issue Type: Improvement
>          Components: server
>            Reporter: Patrick Hunt
>            Assignee: Henry Robinson
>         Attachments: SimpleAddition.rtf
>
>
> Currently cluster membership is statically defined, adding/removing hosts to/from the server cluster dynamically needs to be supported.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (ZOOKEEPER-107) Allow dynamic changes to server cluster membership

Posted by "Patrick Hunt (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12619070#action_12619070 ] 

Patrick Hunt commented on ZOOKEEPER-107:
----------------------------------------

Obviously it would be great if we supported reading from a ZooKeeper cluster!

This just reminded me of another comment I got recently on this. The suggestion was to use a URI (similar to jdbc for example) rather than a host/port list.

Perhaps we should have some sort of plugin architecture here, where the uri would be provided and each registered plugin would map the host/port mapping based on the scheme.


> Allow dynamic changes to server cluster membership
> --------------------------------------------------
>
>                 Key: ZOOKEEPER-107
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107
>             Project: Zookeeper
>          Issue Type: Improvement
>          Components: server
>            Reporter: Patrick Hunt
>
> Currently cluster membership is statically defined, adding/removing hosts to/from the server cluster dynamically needs to be supported.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (ZOOKEEPER-107) Allow dynamic changes to server cluster membership

Posted by "Patrick Hunt (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12617955#action_12617955 ] 

Patrick Hunt commented on ZOOKEEPER-107:
----------------------------------------

Submitted to me by a user. This describes a change both for servers and for clients. Currently servers 
share a configuration file that statically defines the cluster members (servers). Additionally clients are statically 
configured with a list of accessible servers.

"Instead of every client
maintaining a list of zookeeper servers, the servers should maintain
that info (e.g in a special 'node') and handle updates via the
server-to-server protcol. Then the client just needs to know the
server:port of *one* zookeeper server (or a bunch of 'forwarding
zookeepers' for redundancy) that it talks to and the servers take it
from there. 

If one server gets added to the collective, the server-to-server
protocol should propagate it among all servers and all servers update
their maps. Same if a zookeeper server gets moved out of rotation, there
should be an internal protocol to handle this and have all servers
update their maps. "


> Allow dynamic changes to server cluster membership
> --------------------------------------------------
>
>                 Key: ZOOKEEPER-107
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107
>             Project: Zookeeper
>          Issue Type: Improvement
>          Components: server
>            Reporter: Patrick Hunt
>
> Currently cluster membership is statically defined, adding/removing hosts to/from the server cluster dynamically needs to be supported.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (ZOOKEEPER-107) Allow dynamic changes to server cluster membership

Posted by "Benjamin Reed (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12720790#action_12720790 ] 

Benjamin Reed commented on ZOOKEEPER-107:
-----------------------------------------

oh right. you are correct. i guess it is more of a liveness/correctness issue:

1) start with A, B, C, D
2) B is down and A is the leader and proposes LEAVE C and fails where only D gets it.
3) C and D cannot get quorum since C has an older view.
4) D fails
5) A and B come back up and B is elected leader.
6) B proposes LEAVE A and C gets it before B fails.

Now what happens? we cannot get quorum with just A and C since A has the old view. even if D comes up it will not elect C because it does not believe C is part of the ensemble. if they all come up either C or D can be elected leader, but if C is elected you end up with conflicting views: A thinks (B, C, D), B thinks (B, C, D), C thinks (B, C, D), and D thinks (A, B, D), so both A and D will effectively be out of the ensemble and you can't tolerate any failures.



> Allow dynamic changes to server cluster membership
> --------------------------------------------------
>
>                 Key: ZOOKEEPER-107
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107
>             Project: Zookeeper
>          Issue Type: Improvement
>          Components: server
>            Reporter: Patrick Hunt
>         Attachments: SimpleAddition.rtf
>
>
> Currently cluster membership is statically defined, adding/removing hosts to/from the server cluster dynamically needs to be supported.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (ZOOKEEPER-107) Allow dynamic changes to server cluster membership

Posted by "Henry Robinson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12719641#action_12719641 ] 

Henry Robinson commented on ZOOKEEPER-107:
------------------------------------------

Hi - 

Thanks for the proposal - it does a really good job of framing the important questions. 

I am in favour of a solution that uses ZAB and the existing consensus framework for dynamic group membership. I believe this can be achieved without an out-of-band protocol or significant changes to the way the current protocols work; this has the advantage of keeping things simple. 

I'm not certain I've read your proposal correctly, but it seems that step 6 has followers commit the CONFIGCHANGE proposal on receipt, rather than waiting for a COMMIT message. By my understanding of ZAB, this means there is a possibility where fewer than a quorum of followers will commit this proposal, if the leader fails halfway through sending the proposal messages, leading to the possibility of divergent histories at followers. 

The tool approach is one way of wrapping up the authentication required if an ensemble wishes to restrict those nodes that can join it. Currently there is some implicit authentication done as the leader only establishes connections with followers that belong to the static membership. However there's certainly a need, as a result of this JIRA, for a better authentication mechanism inside ZK. I see this as orthogonal to the mechanisms required to do dynamic membership.

I suggest that we simply augment the current ZooKeeper protocol with four new proposals: NEWVIEW, GETVIEW, JOIN and LEAVE. NEWVIEW proposes an entirely new view, and may aggregate many JOIN or LEAVE proposals into one. Since NEWVIEW likely requires knowledge of the current view, GETVIEW returns the current view and its version. JOIN and LEAVE incrementally change the current view, whatever it is, and so do not require a GETVIEW call to establish the current view. 

All proposals go through the usual ZAB two-phase protocol, except for the fact that the leader coordinating the current ZAB instance must wait for acknowledgements from quorums in both the current and new view before committing the change. 

It's possible that this can lead to the proposal blocking if a quorum cannot be assembled in either view. Although it might seem an error that the proposal will block even if a quorum in the current view can be established, the same behaviour would be observed even if the proposal could be committed - all subsequent proposals would require a quorum from the new view and would block. 

If an ensemble is currently blocked due to the failure of n/2 + 1 nodes, it is not possible to resume progress by issuing a LEAVE on behalf of the failed nodes; however in general failed nodes may both JOIN and LEAVE the ensemble. 

If a leader election is required during a proposal, there are no correctness issues assuming the current required invariants of ZAB leader election hold. In particular, as long as the new leader has seen the most recent proposals then the view change proposal will be committed once the new leader is elected. This property will be maintained without changes to the current leader election protocols - as the view change proposal will have been seen by a quorum from the current view, the new leader is guaranteed to have a record of the proposal. 

A node that fails after it has issued a join proposal, but before it hears of its success must be able to find the status of the proposal once it recovers. There are several ways to do this. 

I have some sketches of correctness proofs for this and could produce a more detailed design document if required - however, if there's consensus that this is the right approach I'd rather get coding :) It turns out after much agonising that ZK's existent invariants are already pretty much strong enough to build this protocol. The only extension is the requirement to listen for two different sets of quorum acknowledgements.

I've deliberately avoided the issue of exposing the view to the outside world (although this requires attention, as new nodes need to be able to find the ensemble!) - I have outlined some ideas earlier in this JIRA and I know other people have good suggestions, but I think we can solve both issues independently. 

Would love to hear comments, things that I've missed, errors in logic etc.

> Allow dynamic changes to server cluster membership
> --------------------------------------------------
>
>                 Key: ZOOKEEPER-107
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107
>             Project: Zookeeper
>          Issue Type: Improvement
>          Components: server
>            Reporter: Patrick Hunt
>         Attachments: SimpleAddition.rtf
>
>
> Currently cluster membership is statically defined, adding/removing hosts to/from the server cluster dynamically needs to be supported.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (ZOOKEEPER-107) Allow dynamic changes to server cluster membership

Posted by "Benjamin Reed (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12720321#action_12720321 ] 

Benjamin Reed commented on ZOOKEEPER-107:
-----------------------------------------

i think i agree with flavio about using setData instead of NEWVIEW. (to be honest we had talked about this approach a while back in private.) we have already reserved the /zookeeper namespace (by convention). so if we use /zookeeper/ensemble to store the ensembles, then we can just use setData and getData to implement NEWVIEW and GETVIEW. This has two nice properties: 1) we don't need to touch the protocol code at all. 2) we can use standard clients to administer the view changes. for example, you could use the cli to get and manipulate the views.

@raghu: here is the scenario for split brain: you have an ensemble of A, B, C, D, and E. Your new view will be made up of B, C, D, E, and F. if your update only hits D, E, and F, you can have two working ZooKeeper instances: A, B, C and D, E, F, thus giving your split brain.

> Allow dynamic changes to server cluster membership
> --------------------------------------------------
>
>                 Key: ZOOKEEPER-107
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107
>             Project: Zookeeper
>          Issue Type: Improvement
>          Components: server
>            Reporter: Patrick Hunt
>         Attachments: SimpleAddition.rtf
>
>
> Currently cluster membership is statically defined, adding/removing hosts to/from the server cluster dynamically needs to be supported.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (ZOOKEEPER-107) Allow dynamic changes to server cluster membership

Posted by "Benjamin Reed (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12705794#action_12705794 ] 

Benjamin Reed commented on ZOOKEEPER-107:
-----------------------------------------

sounds great henry! it would be great if you could work on this.

i think we have two strategies:

1) have the cluster agree on a list of servers and use the atomic broadcast to agree on changes. (this might be a bit more difficult with the flexible quorum configuration. right flavio?) this is mostly in line with your first three points. btw, i don't think you need to quiesce for this or even do the sync. i think you can do a conditional update.

2) use some external resource file indicated by a URL to define the machines that make up a cluster. this is in line with your last point and you hint at this with your first point.

i think the first approach is safer and more reliable. the second is easier to implement and easier to see what is going on, but i during transition time you have a problem as the resource file propagates through the cluster. (you could have different members with different views.)

the thing i was thinking of for the first option is exposing the cluster config through a znode '/.zookeeper/ensemble' or something like that. then changing the configuration would be as "simple" as conditionally setting a new version of that file. the tricky part is that you could only commit the change if you have a quorum of followers in both the old and the new configuration. this seems to be in line with what you are thinking correct?

> Allow dynamic changes to server cluster membership
> --------------------------------------------------
>
>                 Key: ZOOKEEPER-107
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107
>             Project: Zookeeper
>          Issue Type: Improvement
>          Components: server
>            Reporter: Patrick Hunt
>
> Currently cluster membership is statically defined, adding/removing hosts to/from the server cluster dynamically needs to be supported.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (ZOOKEEPER-107) Allow dynamic changes to server cluster membership

Posted by "Henry Robinson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12722245#action_12722245 ] 

Henry Robinson commented on ZOOKEEPER-107:
------------------------------------------

As it turns out, I've pretty much implemented Observers in all but name already - they go through the same connection logic as normal followers, and therefore sync, but are disbarred from sending Leader.REQUEST packets to the leader. Similarly, when a leader is sending a proposal packet it only gets sent to those followers which are in the current view. Since the logic is very similar, and we will be able to distinguish observers from followers by whether they are members of the current view, I haven't duplicated code into Observer* classes.

I added this when finding that any new follower can join an existing ensemble and issue proposals to it, even if the static configuration of the ensemble does not contain it. This seemed to deadlock the ensemble pretty quickly :)



> Allow dynamic changes to server cluster membership
> --------------------------------------------------
>
>                 Key: ZOOKEEPER-107
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-107
>             Project: Zookeeper
>          Issue Type: Improvement
>          Components: server
>            Reporter: Patrick Hunt
>            Assignee: Henry Robinson
>         Attachments: SimpleAddition.rtf
>
>
> Currently cluster membership is statically defined, adding/removing hosts to/from the server cluster dynamically needs to be supported.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.