You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@zookeeper.apache.org by Benjamin Reed <br...@yahoo.com> on 2008/07/06 18:53:57 UTC

Re: [Zookeeper-user] Recipes for dealing with disconnection and connection expiration

This is a great FAQ topic!

There are two kinds of connection problems:

1) Disconnections: this callback says that we have disconnected: KeeperStateDisconnected. This state is usually due to a server failure or transient communication error that will hopefully be followed up by a reconnected callback. The basic idea is that when disconnected from ZooKeeper the process will not have a clear idea of changes that are happening, so it should be conservative and assume the worst.

2) Expired session: this callback says that there was a problem, usually a network outage, that prevented the client from keeping its session alive so the session timed out. This state is not recoverable. This is game over a new ZooKeeper object needs to be created the state stored in ZooKeeper needs to be re-queried and re-setup.

Here is the best practice for handling these two states:

1) For disconnections, the server should suspend operations that relied on information in ZooKeeper. For example, a leader should suspend operations that assume it is a leader. Operations resume once the connection is reestablished.

2) For expired sessions, the server should relinquish any rights it received from ZooKeeper and rerun the ZooKeeper initialization operations. For example, a leader will need to give up leadership, create a new ZooKeeper object and rerun the leader election protocol. Restarting the application is a very easy way to do this.

Of course there are always exceptions to these practices. For example, given a leader that is established with ZooKeeper and behaves conservatively by suspending operations on disconnects, even if a process is disconnected from ZooKeeper it could still send requests to the leader process. (A partial network partition may cause one process to not be able to connect to ZooKeeper and still be able to connect to another process that can connect to ZooKeeper.) Personally, I would still write my applications to behave conservatively in these situations since these kind of partial partitionings are difficult to test.

ben




----- Original Message ----
From: Anthony Urso <an...@gmail.com>
To: zookeeper-user@lists.sourceforge.net; zookeeper-user@hadoop.apache.org
Sent: Thursday, July 3, 2008 7:17:32 PM
Subject: [Zookeeper-user] Recipes for dealing with disconnection and connection expiration

Anyone have examples of the right way to deal with ZooKeeper
disconnection or connection expiration?

Currently I am exiting and starting fresh, but hopefully there is a
more efficient pattern.

Cheers,
Anthony

-------------------------------------------------------------------------
Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW!
Studies have shown that voting for your favorite open source project,
along with a healthy diet, reduces your potential for chronic lameness
and boredom. Vote Now at http://www.sourceforge.net/community/cca08
_______________________________________________
Zookeeper-user mailing list
Zookeeper-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/zookeeper-user