You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by James Strachan <ja...@gmail.com> on 2008/07/18 18:45:07 UTC

auto-reconnection ZooKeeper proxy?

<background>
I work on the ActiveMQ project which implements the JMS API - which is
a kinda complex thing but it involves a number of objects
(Connections, Sessions, Producers, Consumers). In some JMS providers
its the end users responsibility to deal with detecting a connection
failure (from any other kind of error) and then automatically
recreating all the dependent objects.

We added support for auto-reconnection which greatly simplifies the
developers life; it lets the JMS client automatically deal with any
socket failures, reconnecting to a broker for you and re-establishing
all of those in-flight operations (subscriptions, in progress sends
and so forth).
http://activemq.apache.org/how-can-i-support-auto-reconnection.html

Having seen the value of wrapping up the auto-reconnection within a
proxy; am thinking its also got merits on ZK
</background>


As we start creating protocols/recipes that implement higher order
features like locks, leader elections and so forth we could probably
do with some kinda auto-reconnecting facade to ZooKeeper just to
simplify the implementation code of protocols/recipes. Its a kinda
complex area though and I'm sure different protocols will want
different things; but even for something so simple as a lock - I can
see the value in an auto-reconnecting proxy.

e.g. there's already 5 different method calls in the current WriteLock
implementation which all really need a custom try/catch around them to
detect loss of the connection which then should be wrapped in a
reconnect-retry logic.

What to do about watches is interesting; though for now the current
behaviour seems fine (fire them all forcing a re-watch) though we
could though in the future re-enable watches in the new server
connection as an option.

All I'm thinking about for now is a kinda ReconnectingZooKeeper which
looks like a ZooKeeper object but which internally catches dead
connections and then internally tries to reconnect to one of the ZK
servers under the covers - retrying the current read/write operation
until the ReconnectPolicy says to fail. e.g. some folks might wanna
retry connecting forever; others for a certain amount of time or
certain number of attempts etc.

So something like...

public class ReconnectingZooKeeper extends ZooKeeper {
  ...
  // for each method that reads/writes synchronously
  public Stat exists(String path) {...
     boolean retry = true;
     for (int count = 0; retry; count++ ) {
       try {

          // really do the method call!
          return super.exists(path);

       } catch (ConnectionClosedException e) {

          // lets let any watches or listeners respond to connection
loss first before we retry
          fireAnyWatchesAndStuff();

          if (!shouldRetry(count)) {
             throw e;
       }
   }
}


Any watches should fire when a connection is lost - and all writes
should be replicated to the new server we connect to right? So I'm
thinking, if we had a ReconnectingZooKeeper implementation, we could
use it with the current WriteLock implementation so that the protocol
could survive ZK server loss & reconnection while still working.

e.g. on connection loss the leader/lock owner needs to loose the lock
until it gets it back just in case; but other than that I think it
should work.

Am sure there's some gremlins somewhere in automatically reconnecting;
though provided the watch mechanism works, clients will be able to do
the right thing I think.

Thoughts?

-- 
James
-------
http://macstrac.blogspot.com/

Open Source Integration
http://open.iona.com

Re: auto-reconnection ZooKeeper proxy?

Posted by James Strachan <ja...@gmail.com>.
I've been experimenting with the WriteLock implementation to deal with
server failure; I've found that its maybe too simplistic creating a
reconnecting ZooKeeper proxy; instead I'm just making it easy to retry
operations (or arbitrary ZK code blocks) using a helper class
(currently called ProtocolSupport but am open to suggestions for a
better class name for a base class for higher level protocol
implementations).

Using the WriteLock as an example; it seems you often want the retry
logic to include a number of calls to ZooKeeper; (e.g. check if a
znode exists, if it doesn't try to create it - retrying the whole
thing when ZK exceptions like connection loss occur etc).

I'll submit the patch soon to ZOOKEEPER-78 including this...
https://issues.apache.org/jira/browse/ZOOKEEPER-78

One thing I have found is I've managed to get a
SessionExpiredException in my test case (not sure why though; I
thought ZooKeeper automatically kept sending keep alive pings?). I
just wondered what a client should do if that happens; I didn't see
any easy way to effectively disconnect and reconnect a ZooKeeper
client in this case.

I'm assuming that the SessionExpiredException is always gonna be
possible; so I've patched ZooKeeper to allow clients to handle a
SessionExpiredException and force a reconnection (to get a new
session).

So I've created a small patch to add a reconnect() method to ZooKeeper
which just closes and recreates the cnxn object...
https://issues.apache.org/jira/browse/ZOOKEEPER-84

(I also added a toString() method for easier debugging when running
test cases with multiple clients in the same jvm).

There's maybe a less drastic way to force the re-connection of a
ZooKeeper client; but I figured trashing and recreating the cnxn
object at least is lowest risk and a simple patch :) and the code
should only be executed rarely so performance isn't such an issue.

Thoughts?

2008/7/18 James Strachan <ja...@gmail.com>:
> <background>
> I work on the ActiveMQ project which implements the JMS API - which is
> a kinda complex thing but it involves a number of objects
> (Connections, Sessions, Producers, Consumers). In some JMS providers
> its the end users responsibility to deal with detecting a connection
> failure (from any other kind of error) and then automatically
> recreating all the dependent objects.
>
> We added support for auto-reconnection which greatly simplifies the
> developers life; it lets the JMS client automatically deal with any
> socket failures, reconnecting to a broker for you and re-establishing
> all of those in-flight operations (subscriptions, in progress sends
> and so forth).
> http://activemq.apache.org/how-can-i-support-auto-reconnection.html
>
> Having seen the value of wrapping up the auto-reconnection within a
> proxy; am thinking its also got merits on ZK
> </background>
>
>
> As we start creating protocols/recipes that implement higher order
> features like locks, leader elections and so forth we could probably
> do with some kinda auto-reconnecting facade to ZooKeeper just to
> simplify the implementation code of protocols/recipes. Its a kinda
> complex area though and I'm sure different protocols will want
> different things; but even for something so simple as a lock - I can
> see the value in an auto-reconnecting proxy.
>
> e.g. there's already 5 different method calls in the current WriteLock
> implementation which all really need a custom try/catch around them to
> detect loss of the connection which then should be wrapped in a
> reconnect-retry logic.
>
> What to do about watches is interesting; though for now the current
> behaviour seems fine (fire them all forcing a re-watch) though we
> could though in the future re-enable watches in the new server
> connection as an option.
>
> All I'm thinking about for now is a kinda ReconnectingZooKeeper which
> looks like a ZooKeeper object but which internally catches dead
> connections and then internally tries to reconnect to one of the ZK
> servers under the covers - retrying the current read/write operation
> until the ReconnectPolicy says to fail. e.g. some folks might wanna
> retry connecting forever; others for a certain amount of time or
> certain number of attempts etc.
>
> So something like...
>
> public class ReconnectingZooKeeper extends ZooKeeper {
>  ...
>  // for each method that reads/writes synchronously
>  public Stat exists(String path) {...
>     boolean retry = true;
>     for (int count = 0; retry; count++ ) {
>       try {
>
>          // really do the method call!
>          return super.exists(path);
>
>       } catch (ConnectionClosedException e) {
>
>          // lets let any watches or listeners respond to connection
> loss first before we retry
>          fireAnyWatchesAndStuff();
>
>          if (!shouldRetry(count)) {
>             throw e;
>       }
>   }
> }
>
>
> Any watches should fire when a connection is lost - and all writes
> should be replicated to the new server we connect to right? So I'm
> thinking, if we had a ReconnectingZooKeeper implementation, we could
> use it with the current WriteLock implementation so that the protocol
> could survive ZK server loss & reconnection while still working.
>
> e.g. on connection loss the leader/lock owner needs to loose the lock
> until it gets it back just in case; but other than that I think it
> should work.
>
> Am sure there's some gremlins somewhere in automatically reconnecting;
> though provided the watch mechanism works, clients will be able to do
> the right thing I think.
>
> Thoughts?
>
> --
> James
> -------
> http://macstrac.blogspot.com/
>
> Open Source Integration
> http://open.iona.com
>



-- 
James
-------
http://macstrac.blogspot.com/

Open Source Integration
http://open.iona.com