You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by Ted Dunning <te...@gmail.com> on 2019/08/16 18:51:37 UTC

thoughts about extension to multi semantics

The recent discussion about if/then/else idioms in ZK has raised the
thought that it might be nice to have some extended semantics.

One version that I could see would be to to extend the current multi-op to
allow multiple alternatives. The idea would be that there would effectively
be multiple branches to be tried. The first one that succeeds atomically
(all or nothing) would be used. The returned value would need to somehow
indicate which alternative succeeded and would need to return any data
accessed. The testing of alternatives would also be atomic so it wouldn't
be possible for things to change within a single operation.

This extension would allow the previous question to be answered like this:

           pick_first {
                 create(...)
           } {
                 set(...)
           }

(the syntax here is just made up and wouldn't actually be supported ... it
is just for pseudo code purposes).


My theory is that this would be relatively easy to implement based on the
current multi operation. Risk due to the change is pretty low given that
there is code to copy.

My question is whether this would actually have all that much benefit.

Does anybody have an opinion on that?

Re: thoughts about extension to multi semantics

Posted by Jordan Zimmerman <jo...@jordanzimmerman.com>.

> On Aug 17, 2019, at 4:41 PM, Ted Dunning <te...@gmail.com> wrote:
> 
> On Sat, Aug 17, 2019 at 4:01 PM Jordan Zimmerman <jo...@jordanzimmerman.com>
> wrote:
> 
>> 
>> 
>> ...
>>> I don't understand that. Watches can be set in a multi.
>> 
>> Not in the public API:
>> https://github.com/apache/zookeeper/blob/master/zookeeper-server/src/main/java/org/apache/zookeeper/Op.java
>> - is it supported in the back-end?
>> 
> 
> Who designed that mess?!?

lol


> 
>> ...>
>>> I don't understand that, either. But this time I just don't understand
>> what you are suggesting and how it helps.
>> 
>> The standard lock recipe creates an ephemeral-sequential node. Once your
>> node (with its sequence number) is returned you call getChildren() to see
>> if you have the lowest numbered node. The lowest numbered node is defined
>> to be the lock holder (or leader, etc.). This requires two round trips.
> 
> 
> Hmm... well looking at the directory in the same operation as the create
> sequential should be easy.

... big snip ...

It seems like the majority of ZK client use cases are a variant of: a) set an ephemeral node; b) query the children of the parent; c) watch for some changes; d) act and reset. It would be nice if the server provided something more than primitives. Of course, we now have Curator to mitigate the difficulty but when you need something that Curator doesn't provide you're faced with the complexity. 

-JZ

Re: thoughts about extension to multi semantics

Posted by Ted Dunning <te...@gmail.com>.
On Sat, Aug 17, 2019 at 4:01 PM Jordan Zimmerman <jo...@jordanzimmerman.com>
wrote:

>
>
> ...
> > I don't understand that. Watches can be set in a multi.
>
> Not in the public API:
> https://github.com/apache/zookeeper/blob/master/zookeeper-server/src/main/java/org/apache/zookeeper/Op.java
> - is it supported in the back-end?
>

Who designed that mess?!?

Seriously, looking back at the API, I vaguely remember punting the question
of watches based on the complexity of setting them cleanly.

Getting watches out of a multi would be a worthy improvement.

> ...>
> > I don't understand that, either. But this time I just don't understand
> what you are suggesting and how it helps.
>
> The standard lock recipe creates an ephemeral-sequential node. Once your
> node (with its sequence number) is returned you call getChildren() to see
> if you have the lowest numbered node. The lowest numbered node is defined
> to be the lock holder (or leader, etc.). This requires two round trips.


Hmm... well looking at the directory in the same operation as the create
sequential should be easy.


> It would be nice to consolidate this into 1 API call. Further, if you're
> not the lowest numbered node, you must set a watch on the node that
> precedes you so you know when to check again.


Setting the watch on the preceding node will require the second call no
matter what because we won't know which know to watch until we get the list
back again.


> This is all very cumbersome to do in client code (thus Curator). Maybe
> there's a way to specify this entire behavior in a multi call.
>

Not all of it. At least not that I understand. And I don't see a big win
for the multi in the first operation other than saving a round-trip which
is nice, but not a massive win.

Re: thoughts about extension to multi semantics

Posted by Jordan Zimmerman <jo...@jordanzimmerman.com>.

> On Aug 17, 2019, at 2:50 PM, Ted Dunning <te...@gmail.com> wrote:
> 
> 
> 
> On Sat, Aug 17, 2019 at 10:19 AM Jordan Zimmerman <jordan@jordanzimmerman.com <ma...@jordanzimmerman.com>> wrote:
> Some thoughts:
> 
> It doesn't really help with any of the "standard" recipes as they all need to set watches.
> 
> I don't understand that. Watches can be set in a multi.

Not in the public API: https://github.com/apache/zookeeper/blob/master/zookeeper-server/src/main/java/org/apache/zookeeper/Op.java - is it supported in the back-end?

>  
> Not to open a can of worms, but if there were a firehose version of watches that could be set independently, this type of multi-op could radically simplify some of the recipes. i.e. one could imagine a multi-op that creates an ephemeral node and then returns a sorted list of child node names so that leader election and locks can be done in one shot. 
> 
> I don't understand that, either. But this time I just don't understand what you are suggesting and how it helps.

The standard lock recipe creates an ephemeral-sequential node. Once your node (with its sequence number) is returned you call getChildren() to see if you have the lowest numbered node. The lowest numbered node is defined to be the lock holder (or leader, etc.). This requires two round trips. It would be nice to consolidate this into 1 API call. Further, if you're not the lowest numbered node, you must set a watch on the node that precedes you so you know when to check again. This is all very cumbersome to do in client code (thus Curator). Maybe there's a way to specify this entire behavior in a multi call. 

---

I'll read/review the queue idea separately.

-Jordan


Re: thoughts about extension to multi semantics

Posted by Ted Dunning <te...@gmail.com>.
On Sat, Aug 17, 2019 at 10:19 AM Jordan Zimmerman <
jordan@jordanzimmerman.com> wrote:

> Some thoughts:
>
> It doesn't really help with any of the "standard" recipes as they all need
> to set watches.


I don't understand that. Watches can be set in a multi.


> Not to open a can of worms, but if there were a firehose version of
> watches that could be set independently, this type of multi-op could
> radically simplify some of the recipes. i.e. one could imagine a multi-op
> that creates an ephemeral node and then returns a sorted list of child node
> names so that leader election and locks can be done in one shot.
>

I don't understand that, either. But this time I just don't understand what
you are suggesting and how it helps.


> An atomic counter could be done much more simply than how Curator does it
> now as the test/increment could be done server side
>

I don't think so. No arithmetic is included in the current multi.


> Queues would be easier (possibly - I need to think about this some more).
> Curator's queue code is very complex.
>

I could imagine some simplification. Suppose that our queue is either an
empty directory or it looks like this:

[image: image.png]
(figure also at https://www.dropbox.com/s/qwwn9ahgxqh9iyf/queue.png?dl=0)

The idea is that the master znode is used to coordinate directory updates
and each running or pending task has an ephemeral znode. Whenever the
currently running task finishes or crashes, the corresponding task znode
will disappear and wake up the next pending task. If a task in the middle
of the queue disappears, the next task in line will wake up and should
determine what it should start watching.

Some issues occur when we would like to be sure that we either create the
master znode (because we have an empty queue) or that we read the master to
find out the what last task should be. Multi and first_multi both can help
with this.

repeat {
     one_of {
           create leader znode
           create ephemeral task znode
     } or {
           get leader znode version
           get directory contents
     }

   // if we didn't create new leader node, look at directory and pick a
task node name
   // to create and a task node to watch if that doesn't succeed, something
changed and
   // we repeat
     multi {
           create ephemeral task znode
           write to master znode and verify version of master znode
           put watch on previous znode in queue (if any)
     }
  } until success

When a task finishes or crashes, it can simply delete its own task znode
(or let the znode evaporate on its own). If there is another task pending,
it will be notified. Whenever a task is notified, it should get the master
version and the directory contents and decide who it should watch (if it
isn't the head of the queue) or that it should start work (if it is the
head of the queue). Either way, after making such a decision, it should
verify the master version, write to the master and set a watch using a
multi.


> Anyway - I'll try to spend some time in Curator's various recipes to see
> how they would be  simplified if this server-side feature was available.


Very cool. Very interested in hearing more thoughts on this.

Re: thoughts about extension to multi semantics

Posted by Jordan Zimmerman <jo...@jordanzimmerman.com>.
Some thoughts:

It doesn't really help with any of the "standard" recipes as they all need to set watches. Not to open a can of worms, but if there were a firehose version of watches that could be set independently, this type of multi-op could radically simplify some of the recipes. i.e. one could imagine a multi-op that creates an ephemeral node and then returns a sorted list of child node names so that leader election and locks can be done in one shot. 
An atomic counter could be done much more simply than how Curator does it now as the test/increment could be done server side
Queues would be easier (possibly - I need to think about this some more). Curator's queue code is very complex.

Anyway - I'll try to spend some time in Curator's various recipes to see how they would be simplified if this server-side feature was available.

-Jordan

> On Aug 16, 2019, at 11:51 AM, Ted Dunning <te...@gmail.com> wrote:
> 
> The recent discussion about if/then/else idioms in ZK has raised the
> thought that it might be nice to have some extended semantics.
> 
> One version that I could see would be to to extend the current multi-op to
> allow multiple alternatives. The idea would be that there would effectively
> be multiple branches to be tried. The first one that succeeds atomically
> (all or nothing) would be used. The returned value would need to somehow
> indicate which alternative succeeded and would need to return any data
> accessed. The testing of alternatives would also be atomic so it wouldn't
> be possible for things to change within a single operation.
> 
> This extension would allow the previous question to be answered like this:
> 
>           pick_first {
>                 create(...)
>           } {
>                 set(...)
>           }
> 
> (the syntax here is just made up and wouldn't actually be supported ... it
> is just for pseudo code purposes).
> 
> 
> My theory is that this would be relatively easy to implement based on the
> current multi operation. Risk due to the change is pretty low given that
> there is code to copy.
> 
> My question is whether this would actually have all that much benefit.
> 
> Does anybody have an opinion on that?


Re: thoughts about extension to multi semantics

Posted by Ted Dunning <te...@gmail.com>.
Being lazy, I would suggest only the tests we already have. Existence and
version.


On Sat, Aug 17, 2019, 5:30 AM Enrico Olivelli <eo...@gmail.com> wrote:

> Il sab 17 ago 2019, 08:01 Ted Dunning <te...@gmail.com> ha scritto:
>
> > It definitely sounds like a nice feature.
> >
> > The important question is what is the actual importance after you
> multiply
> > it by the amount of usage it gets.
> >
> > For instance, I know that multi gets a bit of usage, but I would guess
> that
> > it actually gets very, very little. It might even most of the cases that
> > you have in mind.
> >
> > If that is so, how much would an extension to multi actually be used?
> >
>
> I am adding a question: how will the 'test' look like? I image these cases:
> - test if node exists
> - test about version
> - test about the content of the znode? (This will be trickers, are zk does
> know nothing about the format of the content)
>
>
> Enrico
>
>
>
> >
> >
> > On Fri, Aug 16, 2019 at 8:28 PM Michael Han <ha...@apache.org> wrote:
> >
> > > This sounds a nice feature to me as it enables user to do more without
> > > obvious downside. It could be useful in cases like state management
> where
> > > the state is stored in a fine grained approach across multiple zNode,
> > > instead of in a single zNode.
> > >
> > > On Fri, Aug 16, 2019 at 11:52 AM Ted Dunning <te...@gmail.com>
> > > wrote:
> > >
> > > > The recent discussion about if/then/else idioms in ZK has raised the
> > > > thought that it might be nice to have some extended semantics.
> > > >
> > > > One version that I could see would be to to extend the current
> multi-op
> > > to
> > > > allow multiple alternatives. The idea would be that there would
> > > effectively
> > > > be multiple branches to be tried. The first one that succeeds
> > atomically
> > > > (all or nothing) would be used. The returned value would need to
> > somehow
> > > > indicate which alternative succeeded and would need to return any
> data
> > > > accessed. The testing of alternatives would also be atomic so it
> > wouldn't
> > > > be possible for things to change within a single operation.
> > > >
> > > > This extension would allow the previous question to be answered like
> > > this:
> > > >
> > > >            pick_first {
> > > >                  create(...)
> > > >            } {
> > > >                  set(...)
> > > >            }
> > > >
> > > > (the syntax here is just made up and wouldn't actually be supported
> ...
> > > it
> > > > is just for pseudo code purposes).
> > > >
> > > >
> > > > My theory is that this would be relatively easy to implement based on
> > the
> > > > current multi operation. Risk due to the change is pretty low given
> > that
> > > > there is code to copy.
> > > >
> > > > My question is whether this would actually have all that much
> benefit.
> > > >
> > > > Does anybody have an opinion on that?
> > > >
> > >
> >
>

Re: thoughts about extension to multi semantics

Posted by Enrico Olivelli <eo...@gmail.com>.
Il sab 17 ago 2019, 08:01 Ted Dunning <te...@gmail.com> ha scritto:

> It definitely sounds like a nice feature.
>
> The important question is what is the actual importance after you multiply
> it by the amount of usage it gets.
>
> For instance, I know that multi gets a bit of usage, but I would guess that
> it actually gets very, very little. It might even most of the cases that
> you have in mind.
>
> If that is so, how much would an extension to multi actually be used?
>

I am adding a question: how will the 'test' look like? I image these cases:
- test if node exists
- test about version
- test about the content of the znode? (This will be trickers, are zk does
know nothing about the format of the content)


Enrico



>
>
> On Fri, Aug 16, 2019 at 8:28 PM Michael Han <ha...@apache.org> wrote:
>
> > This sounds a nice feature to me as it enables user to do more without
> > obvious downside. It could be useful in cases like state management where
> > the state is stored in a fine grained approach across multiple zNode,
> > instead of in a single zNode.
> >
> > On Fri, Aug 16, 2019 at 11:52 AM Ted Dunning <te...@gmail.com>
> > wrote:
> >
> > > The recent discussion about if/then/else idioms in ZK has raised the
> > > thought that it might be nice to have some extended semantics.
> > >
> > > One version that I could see would be to to extend the current multi-op
> > to
> > > allow multiple alternatives. The idea would be that there would
> > effectively
> > > be multiple branches to be tried. The first one that succeeds
> atomically
> > > (all or nothing) would be used. The returned value would need to
> somehow
> > > indicate which alternative succeeded and would need to return any data
> > > accessed. The testing of alternatives would also be atomic so it
> wouldn't
> > > be possible for things to change within a single operation.
> > >
> > > This extension would allow the previous question to be answered like
> > this:
> > >
> > >            pick_first {
> > >                  create(...)
> > >            } {
> > >                  set(...)
> > >            }
> > >
> > > (the syntax here is just made up and wouldn't actually be supported ...
> > it
> > > is just for pseudo code purposes).
> > >
> > >
> > > My theory is that this would be relatively easy to implement based on
> the
> > > current multi operation. Risk due to the change is pretty low given
> that
> > > there is code to copy.
> > >
> > > My question is whether this would actually have all that much benefit.
> > >
> > > Does anybody have an opinion on that?
> > >
> >
>

Re: thoughts about extension to multi semantics

Posted by Ted Dunning <te...@gmail.com>.
On Sat, Aug 17, 2019, 8:33 AM Michael Han <ha...@apache.org> wrote:

> >> I would guess that it actually gets very, very little
>
> Can't speak for others, but for the zookeeper clusters I maintain
> internally, multi was used extensively in certain use cases.
>

Cool!

Very glad to hear it.

>

Re: thoughts about extension to multi semantics

Posted by Michael Han <ha...@apache.org>.
>> I would guess that it actually gets very, very little

Can't speak for others, but for the zookeeper clusters I maintain
internally, multi was used extensively in certain use cases.

>> how much would an extension to multi actually be used

This looks like a chicken egg problem to me. I feel the improved
expressiveness of multi might lead to some use cases thus promote the multi
usage in a positive feedback loop.

On Fri, Aug 16, 2019 at 11:01 PM Ted Dunning <te...@gmail.com> wrote:

> It definitely sounds like a nice feature.
>
> The important question is what is the actual importance after you multiply
> it by the amount of usage it gets.
>
> For instance, I know that multi gets a bit of usage, but I would guess that
> it actually gets very, very little. It might even most of the cases that
> you have in mind.
>
> If that is so, how much would an extension to multi actually be used?
>
>
>
> On Fri, Aug 16, 2019 at 8:28 PM Michael Han <ha...@apache.org> wrote:
>
> > This sounds a nice feature to me as it enables user to do more without
> > obvious downside. It could be useful in cases like state management where
> > the state is stored in a fine grained approach across multiple zNode,
> > instead of in a single zNode.
> >
> > On Fri, Aug 16, 2019 at 11:52 AM Ted Dunning <te...@gmail.com>
> > wrote:
> >
> > > The recent discussion about if/then/else idioms in ZK has raised the
> > > thought that it might be nice to have some extended semantics.
> > >
> > > One version that I could see would be to to extend the current multi-op
> > to
> > > allow multiple alternatives. The idea would be that there would
> > effectively
> > > be multiple branches to be tried. The first one that succeeds
> atomically
> > > (all or nothing) would be used. The returned value would need to
> somehow
> > > indicate which alternative succeeded and would need to return any data
> > > accessed. The testing of alternatives would also be atomic so it
> wouldn't
> > > be possible for things to change within a single operation.
> > >
> > > This extension would allow the previous question to be answered like
> > this:
> > >
> > >            pick_first {
> > >                  create(...)
> > >            } {
> > >                  set(...)
> > >            }
> > >
> > > (the syntax here is just made up and wouldn't actually be supported ...
> > it
> > > is just for pseudo code purposes).
> > >
> > >
> > > My theory is that this would be relatively easy to implement based on
> the
> > > current multi operation. Risk due to the change is pretty low given
> that
> > > there is code to copy.
> > >
> > > My question is whether this would actually have all that much benefit.
> > >
> > > Does anybody have an opinion on that?
> > >
> >
>

Re: thoughts about extension to multi semantics

Posted by Ted Dunning <te...@gmail.com>.
It definitely sounds like a nice feature.

The important question is what is the actual importance after you multiply
it by the amount of usage it gets.

For instance, I know that multi gets a bit of usage, but I would guess that
it actually gets very, very little. It might even most of the cases that
you have in mind.

If that is so, how much would an extension to multi actually be used?



On Fri, Aug 16, 2019 at 8:28 PM Michael Han <ha...@apache.org> wrote:

> This sounds a nice feature to me as it enables user to do more without
> obvious downside. It could be useful in cases like state management where
> the state is stored in a fine grained approach across multiple zNode,
> instead of in a single zNode.
>
> On Fri, Aug 16, 2019 at 11:52 AM Ted Dunning <te...@gmail.com>
> wrote:
>
> > The recent discussion about if/then/else idioms in ZK has raised the
> > thought that it might be nice to have some extended semantics.
> >
> > One version that I could see would be to to extend the current multi-op
> to
> > allow multiple alternatives. The idea would be that there would
> effectively
> > be multiple branches to be tried. The first one that succeeds atomically
> > (all or nothing) would be used. The returned value would need to somehow
> > indicate which alternative succeeded and would need to return any data
> > accessed. The testing of alternatives would also be atomic so it wouldn't
> > be possible for things to change within a single operation.
> >
> > This extension would allow the previous question to be answered like
> this:
> >
> >            pick_first {
> >                  create(...)
> >            } {
> >                  set(...)
> >            }
> >
> > (the syntax here is just made up and wouldn't actually be supported ...
> it
> > is just for pseudo code purposes).
> >
> >
> > My theory is that this would be relatively easy to implement based on the
> > current multi operation. Risk due to the change is pretty low given that
> > there is code to copy.
> >
> > My question is whether this would actually have all that much benefit.
> >
> > Does anybody have an opinion on that?
> >
>

Re: thoughts about extension to multi semantics

Posted by Michael Han <ha...@apache.org>.
This sounds a nice feature to me as it enables user to do more without
obvious downside. It could be useful in cases like state management where
the state is stored in a fine grained approach across multiple zNode,
instead of in a single zNode.

On Fri, Aug 16, 2019 at 11:52 AM Ted Dunning <te...@gmail.com> wrote:

> The recent discussion about if/then/else idioms in ZK has raised the
> thought that it might be nice to have some extended semantics.
>
> One version that I could see would be to to extend the current multi-op to
> allow multiple alternatives. The idea would be that there would effectively
> be multiple branches to be tried. The first one that succeeds atomically
> (all or nothing) would be used. The returned value would need to somehow
> indicate which alternative succeeded and would need to return any data
> accessed. The testing of alternatives would also be atomic so it wouldn't
> be possible for things to change within a single operation.
>
> This extension would allow the previous question to be answered like this:
>
>            pick_first {
>                  create(...)
>            } {
>                  set(...)
>            }
>
> (the syntax here is just made up and wouldn't actually be supported ... it
> is just for pseudo code purposes).
>
>
> My theory is that this would be relatively easy to implement based on the
> current multi operation. Risk due to the change is pretty low given that
> there is code to copy.
>
> My question is whether this would actually have all that much benefit.
>
> Does anybody have an opinion on that?
>