You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@zookeeper.apache.org by Alexis Midon <al...@gmail.com> on 2010/06/13 07:07:26 UTC

Watchers & error handling

Hi all,

I implemented queues and locks on top of ZooKeeper, and I'm pretty happy so
far. Thanks for the nice work. Tests look good. So good that we can focus on
exception/error handling and I got a couple of questions.

#1. Regarding the use of the default watcher. A ZooKeeper instance has a
default watcher, most operations can also specify a watcher. When both are
set, does the operation watcher override the default watcher?
 or will both watchers be invoked? if so in which order? Does each watcher
receive all the types of event?
I had a look at the code, and my understanding is that the default watcher
will always receive the type-NONE events, even if an "operation" watcher is
set. No guarantee on the order of invocation though. Could you confirm
and/or complete please?

#2 After a connection loss, the client will eventually reconnect to the ZK
cluster so I guess I can keep using the same client instance. But are there
cases where it is necessary to re-instantiate a ZooKeeper client? As a first
recovery-strategy, is that ok to always recreate a client so that any
ephemeral node previously owned disappear?
The case I struggle with is the following:
Let's say I've acquired a lock (i.e. an ephemeral locknode is created).
Some application logic failed due to a connection loss. At this stage I'd
like to give up/roll back. Here I would typically throw an exception, the
lock being released in a finally. But I can't release the lock since the
connection is down. Later the client eventually reconnects, the session
didn't expire so the locknode still exists. Now no one else can acquire this
lock until my session expires.

#3. could you describe the recommended actions for each exception code?

I hope my questions would make some sense to you. Thanks in advance for your
answers,

Alexis

Re: Watchers & error handling

Posted by Alexis Midon <al...@gmail.com>.

Hi guys,

could you confirm that the default watcher is always invoked for none-type
events, even if the action set a different listener?

thanks,

Alexis

On Sat, Jun 12, 2010 at 10:07 PM, Alexis Midon <al...@gmail.com>wrote:

> Hi all,
>
> I implemented queues and locks on top of ZooKeeper, and I'm pretty happy so
> far. Thanks for the nice work. Tests look good. So good that we can focus on
> exception/error handling and I got a couple of questions.
>
> #1. Regarding the use of the default watcher. A ZooKeeper instance has a
> default watcher, most operations can also specify a watcher. When both are
> set, does the operation watcher override the default watcher?
>  or will both watchers be invoked? if so in which order? Does each watcher
> receive all the types of event?
> I had a look at the code, and my understanding is that the default watcher
> will always receive the type-NONE events, even if an "operation" watcher is
> set. No guarantee on the order of invocation though. Could you confirm
> and/or complete please?
>
> #2 After a connection loss, the client will eventually reconnect to the ZK
> cluster so I guess I can keep using the same client instance. But are there
> cases where it is necessary to re-instantiate a ZooKeeper client? As a first
> recovery-strategy, is that ok to always recreate a client so that any
> ephemeral node previously owned disappear?
> The case I struggle with is the following:
> Let's say I've acquired a lock (i.e. an ephemeral locknode is created).
> Some application logic failed due to a connection loss. At this stage I'd
> like to give up/roll back. Here I would typically throw an exception, the
> lock being released in a finally. But I can't release the lock since the
> connection is down. Later the client eventually reconnects, the session
> didn't expire so the locknode still exists. Now no one else can acquire this
> lock until my session expires.
>
> #3. could you describe the recommended actions for each exception code?
>
> I hope my questions would make some sense to you. Thanks in advance for
> your answers,
>
>  Alexis
>
>
>
>

Re: Watchers & error handling

Posted by Patrick Hunt <ph...@apache.org>.

On 06/25/2010 02:47 PM, Alexis Midon wrote:
> 1. Session events i.e. Type-None events are sent to all outstanding
> watch handlers. So if you do get(path, watcherX), both the default
> listener and watcherX will receive the session events.

That's true. This enables the watcher to handle the case (for example) 
when the client has become disconnected from the cluster. Per operation 
watchers was specifically added to support the "zk library" case - where 
more than a single consumer would be using the client connection. Makes 
it alot easier to add libraries dependent on zk.

>   2. Watchers are one-time triggers, however session events do NOT
> remove a watcher.
>   In other words, if we're listening for NodeCreated event and a
> disconnection occurs, we will eventually get notify of a Disconnected,
> then a SyncConnected and finally a NodeCreated without having to set any
> new watcher.

Correct.

>   3. If the invocation of a (synchronous or asynchronous) method fails,
> the watcher is not set. For instance if getChildren("/foo", mywatcher)
> fails because the client is disconnected, mywatcher won't be notified of
> futur events.

Correct, a watch is only valid if the operation was successful.

>
> I apologize in advance if I'm stating the obvious but the differences
> between "path" events and "session" events were not clear to me.
>

No, this is great. Feel free to enter a JIRA if this is not clear enough.

> <http://hadoop.apache.org/zookeeper/docs/r3.1.1/zookeeperProgrammers.html#ch_zkWatches>Alexis
>

This (3.1.1) is a pretty old version of the docs, I'd suggest that you 
look at the most recent before entering JIRAs:

http://hadoop.apache.org/zookeeper/docs/current/zookeeperProgrammers.html#ch_zkWatches

Regards,

Patrick

> On Fri, Jun 25, 2010 at 12:36 PM, Patrick Hunt <phunt@apache.org
> <ma...@apache.org>> wrote:
>
>
>
>     On 06/12/2010 10:07 PM, Alexis Midon wrote:
>
>         I implemented queues and locks on top of ZooKeeper, and I'm
>         pretty happy so
>         far. Thanks for the nice work. Tests look good. So good that we
>         can focus on
>         exception/error handling and I got a couple of questions.
>
>         #1. Regarding the use of the default watcher. A ZooKeeper
>         instance has a
>         default watcher, most operations can also specify a watcher.
>         When both are
>         set, does the operation watcher override the default watcher?
>
>
>     if you use the get(path, bool) then the default watcher is notified,
>     if you use get(path, watcherX) then only "watcherX" is notified.
>
>
>           or will both watchers be invoked? if so in which order? Does
>         each watcher
>         receive all the types of event?
>
>
>     no, both watchers are not invoked.
>
>
>         I had a look at the code, and my understanding is that the
>         default watcher
>         will always receive the type-NONE events, even if an "operation"
>         watcher is
>         set. No guarantee on the order of invocation though. Could you
>         confirm
>         and/or complete please?
>
>
>     The watcher gets both state change notifications and watch events.
>     You can register multiple watchers for the same path (incl the
>     default), there is no guarantee on ordering at all.
>
>
>         #2 After a connection loss, the client will eventually reconnect
>         to the ZK
>         cluster so I guess I can keep using the same client instance.
>         But are there
>
>
>     right
>
>
>         cases where it is necessary to re-instantiate a ZooKeeper
>         client? As a first
>         recovery-strategy, is that ok to always recreate a client so
>         that any
>         ephemeral node previously owned disappear?
>
>
>     if the session is expired that's the case you need to recreate the
>     session object (or if you explicitly close).
>
>     Yes, this is a fine strategy if your application domain "fits". If
>     you have a very expensive "recovery" or "bootstrap" process then
>     recreating the session on every disconnect would be a bad idea.
>
>
>         The case I struggle with is the following:
>         Let's say I've acquired a lock (i.e. an ephemeral locknode is
>         created).
>         Some application logic failed due to a connection loss. At this
>         stage I'd
>         like to give up/roll back. Here I would typically throw an
>         exception, the
>         lock being released in a finally. But I can't release the lock
>         since the
>         connection is down. Later the client eventually reconnects, the
>         session
>         didn't expire so the locknode still exists. Now no one else can
>         acquire this
>         lock until my session expires.
>
>
>     Yes, you are reading the situation correctly. In this case you
>     either have to take the easy route - close the session and create a
>     new one (again, if your app domain supports this) or your client
>     needs to check if the lock is still being held (it's still the
>     owner) when it's eventually reconnected. You can verify this for an
>     ephemeral node by looking at the "ephemeralOwner" field of the Stat
>     object. If this matches your session id then you are the owner and
>     still hold the lock. This is a bit tricky to get right though, so in
>     some cases clients just close the session and recreate.
>
>
>
>         #3. could you describe the recommended actions for each
>         exception code?
>
>
>     this is highly dependent on your application requirements. See above
>     for my general information. ff to ask more questions.
>
>     Regards,
>
>     Patrick
>
>

Re: Watchers & error handling

Posted by Alexis Midon <al...@gmail.com>.

Hi Patrick,

thanks for your answers. I did some tests yesterday and observed the
following behaviors:

1. Session events i.e. Type-None events are sent to all outstanding watch
handlers. So if you do get(path, watcherX), both the default listener and
watcherX will receive the session events.
 2. Watchers are one-time triggers, however session events do NOT remove a
watcher.
 In other words, if we're listening for NodeCreated event and a
disconnection occurs, we will eventually get notify of a Disconnected, then
a SyncConnected and finally a NodeCreated without having to set any new
watcher.
 3. If the invocation of a (synchronous or asynchronous) method fails, the
watcher is not set. For instance if getChildren("/foo", mywatcher) fails
because the client is disconnected, mywatcher won't be notified of futur
events.

I apologize in advance if I'm stating the obvious but the differences
between "path" events and "session" events were not clear to me.

<http://hadoop.apache.org/zookeeper/docs/r3.1.1/zookeeperProgrammers.html#ch_zkWatches>
Alexis

On Fri, Jun 25, 2010 at 12:36 PM, Patrick Hunt <ph...@apache.org> wrote:

>
>
> On 06/12/2010 10:07 PM, Alexis Midon wrote:
>
>> I implemented queues and locks on top of ZooKeeper, and I'm pretty happy
>> so
>> far. Thanks for the nice work. Tests look good. So good that we can focus
>> on
>> exception/error handling and I got a couple of questions.
>>
>> #1. Regarding the use of the default watcher. A ZooKeeper instance has a
>> default watcher, most operations can also specify a watcher. When both are
>> set, does the operation watcher override the default watcher?
>>
>
> if you use the get(path, bool) then the default watcher is notified, if you
> use get(path, watcherX) then only "watcherX" is notified.
>
>
>   or will both watchers be invoked? if so in which order? Does each watcher
>> receive all the types of event?
>>
>
> no, both watchers are not invoked.
>
>
>  I had a look at the code, and my understanding is that the default watcher
>> will always receive the type-NONE events, even if an "operation" watcher
>> is
>> set. No guarantee on the order of invocation though. Could you confirm
>> and/or complete please?
>>
>>
> The watcher gets both state change notifications and watch events. You can
> register multiple watchers for the same path (incl the default), there is no
> guarantee on ordering at all.
>
>
>  #2 After a connection loss, the client will eventually reconnect to the ZK
>> cluster so I guess I can keep using the same client instance. But are
>> there
>>
>
> right
>
>
>  cases where it is necessary to re-instantiate a ZooKeeper client? As a
>> first
>> recovery-strategy, is that ok to always recreate a client so that any
>> ephemeral node previously owned disappear?
>>
>
> if the session is expired that's the case you need to recreate the session
> object (or if you explicitly close).
>
> Yes, this is a fine strategy if your application domain "fits". If you have
> a very expensive "recovery" or "bootstrap" process then recreating the
> session on every disconnect would be a bad idea.
>
>
>  The case I struggle with is the following:
>> Let's say I've acquired a lock (i.e. an ephemeral locknode is created).
>> Some application logic failed due to a connection loss. At this stage I'd
>> like to give up/roll back. Here I would typically throw an exception, the
>> lock being released in a finally. But I can't release the lock since the
>> connection is down. Later the client eventually reconnects, the session
>> didn't expire so the locknode still exists. Now no one else can acquire
>> this
>> lock until my session expires.
>>
>
> Yes, you are reading the situation correctly. In this case you either have
> to take the easy route - close the session and create a new one (again, if
> your app domain supports this) or your client needs to check if the lock is
> still being held (it's still the owner) when it's eventually reconnected.
> You can verify this for an ephemeral node by looking at the "ephemeralOwner"
> field of the Stat object. If this matches your session id then you are the
> owner and still hold the lock. This is a bit tricky to get right though, so
> in some cases clients just close the session and recreate.
>
>
>
>> #3. could you describe the recommended actions for each exception code?
>>
>
> this is highly dependent on your application requirements. See above for my
> general information. ff to ask more questions.
>
> Regards,
>
> Patrick
>

Re: Watchers & error handling

Posted by Patrick Hunt <ph...@apache.org>.

On 06/12/2010 10:07 PM, Alexis Midon wrote:
> I implemented queues and locks on top of ZooKeeper, and I'm pretty happy so
> far. Thanks for the nice work. Tests look good. So good that we can focus on
> exception/error handling and I got a couple of questions.
>
> #1. Regarding the use of the default watcher. A ZooKeeper instance has a
> default watcher, most operations can also specify a watcher. When both are
> set, does the operation watcher override the default watcher?

if you use the get(path, bool) then the default watcher is notified, if 
you use get(path, watcherX) then only "watcherX" is notified.

>   or will both watchers be invoked? if so in which order? Does each watcher
> receive all the types of event?

no, both watchers are not invoked.

> I had a look at the code, and my understanding is that the default watcher
> will always receive the type-NONE events, even if an "operation" watcher is
> set. No guarantee on the order of invocation though. Could you confirm
> and/or complete please?
>

The watcher gets both state change notifications and watch events. You 
can register multiple watchers for the same path (incl the default), 
there is no guarantee on ordering at all.

> #2 After a connection loss, the client will eventually reconnect to the ZK
> cluster so I guess I can keep using the same client instance. But are there

right

> cases where it is necessary to re-instantiate a ZooKeeper client? As a first
> recovery-strategy, is that ok to always recreate a client so that any
> ephemeral node previously owned disappear?

if the session is expired that's the case you need to recreate the 
session object (or if you explicitly close).

Yes, this is a fine strategy if your application domain "fits". If you 
have a very expensive "recovery" or "bootstrap" process then recreating 
the session on every disconnect would be a bad idea.

> The case I struggle with is the following:
> Let's say I've acquired a lock (i.e. an ephemeral locknode is created).
> Some application logic failed due to a connection loss. At this stage I'd
> like to give up/roll back. Here I would typically throw an exception, the
> lock being released in a finally. But I can't release the lock since the
> connection is down. Later the client eventually reconnects, the session
> didn't expire so the locknode still exists. Now no one else can acquire this
> lock until my session expires.

Yes, you are reading the situation correctly. In this case you either 
have to take the easy route - close the session and create a new one 
(again, if your app domain supports this) or your client needs to check 
if the lock is still being held (it's still the owner) when it's 
eventually reconnected. You can verify this for an ephemeral node by 
looking at the "ephemeralOwner" field of the Stat object. If this 
matches your session id then you are the owner and still hold the lock. 
This is a bit tricky to get right though, so in some cases clients just 
close the session and recreate.

>
> #3. could you describe the recommended actions for each exception code?

this is highly dependent on your application requirements. See above for 
my general information. ff to ask more questions.

Regards,

Patrick