You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by Qian Ye <ye...@gmail.com> on 2010/12/09 13:19:41 UTC

need for more conditional write support

Hi all:

I'm working on a distributed system these days, and need more conditional
write support on Zookeeper. Now the zookeeper only support modifing, delete
or set, node data with a version number represent the current version of the
node. I need modification on the condition of other nodes. For e.g. I want
to set the node data of /node to A, if the node data of /node1 is B and the
node data of /node2 is C. Should we support this kind of interface?

thanks
-- 
With Regards!

Ye, Qian

Re: need for more conditional write support

Posted by Dave Wright <wr...@gmail.com>.
If this work were to be done, I would much rather prefer the ability
to set multiple values at the same time, as this enables a whole host
of possibilities including certain transactional updating capabilities
that aren't possible now (far too easy to get in an in-consistent
state if you fail partway through updating multiple nodes).


-Dave Wright

On Wed, Dec 15, 2010 at 2:21 AM, Qian Ye <ye...@gmail.com> wrote:
> I have read the mails on
> http://www.mail-archive.com/zookeeper-dev@hadoop.apache.org/msg08315.html,
> and here is some further thinking about this issue.
>
> The interface like
>
> zoo_multi_test_and_set(List<int> versions, List<string> znodes,
> List<byte[]> data)
>
> can solve the problem I mentioned before, and some relavant issues, like
> hard for programmers to use, as mentioned in mail-archive, should be paid
> attention to. I think we can move small step first, that is, provide
> interface like
>
> zoo_multi_test_and_set(List<int> versions, List<string> znodes, byte[]
> data, string znode)
>
>
> The API test versions of several different znodes before set one znode, and
> if the client want to set other znode, it can call this API repeatedly.
> Because we only set one node by this API, the result will be straight,
> success or failure. We need not take care of the half-success result.
>
> How do ur guys think about this API?
>
> Thanks~
>
> On Fri, Dec 10, 2010 at 11:57 PM, Ted Dunning <te...@gmail.com> wrote:
>
>> Yes.  I can imagine that being able to write several variables, specifying
>> the version of each would
>> be useful.  But not yet supported.
>>
>> On Fri, Dec 10, 2010 at 12:01 AM, Qian Ye <ye...@gmail.com> wrote:
>>
>> > What's more, I think this kind of conditional write support is simpler
>> than
>> > multiple transactions. Multiple transactions can be built with this kind
>> of
>> > support. The link is broken?
>> >
>>
>
>
>
> --
> With Regards!
>
> Ye, Qian
>

Re: need for more conditional write support

Posted by Ted Dunning <te...@gmail.com>.
The way that zookeeper works internally probably makes these two options
about equal difficulty.  The master has to examine the queue augmented by
the versions in the multi-update in either case.  The only difference is
whether after qualifying the update does it copy multiple one unconditional
update or several.

On Wed, Dec 15, 2010 at 7:20 PM, Qian Ye <ye...@gmail.com> wrote:

> I think the second is easier to implement because it only updates data on
> one node, and need not handle rollback in the case of some update failure
> after some update success.
>

Re: need for more conditional write support

Posted by Ted Dunning <te...@gmail.com>.
The way that zookeeper works internally probably makes these two options
about equal difficulty.  The master has to examine the queue augmented by
the versions in the multi-update in either case.  The only difference is
whether after qualifying the update does it copy multiple one unconditional
update or several.

On Wed, Dec 15, 2010 at 7:20 PM, Qian Ye <ye...@gmail.com> wrote:

> I think the second is easier to implement because it only updates data on
> one node, and need not handle rollback in the case of some update failure
> after some update success.
>

Re: need for more conditional write support

Posted by Qian Ye <ye...@gmail.com>.
I think the second is easier to implement because it only updates data on
one node, and need not handle rollback in the case of some update failure
after some update success.

Well, I agree that the API
zoo_multi_test_and_set(List<int> versions, List<string> znodes_test,
List<byte[]> data, List<string> znodes_set);

is more powerful. If we can decide to do this, I think we could start to
discuss some implement details.


On Wed, Dec 15, 2010 at 11:50 PM, Ted Dunning <te...@gmail.com> wrote:

> Well, I would just call the first method set.
>
> And I think that the second method is no easier to implement and probably a
> bit less useful.
>
> The idea that the second might be almost as useful as the first is
> interesting however.  It probably
> means that we should allow some of the data elements to be null or
> something
> to allow for testing
> versions but not setting data.
>
> On Tue, Dec 14, 2010 at 11:21 PM, Qian Ye <ye...@gmail.com> wrote:
>
> > zoo_multi_test_and_set(List<int> versions, List<string> znodes,
> > List<byte[]> data)
> >
> > can solve the problem I mentioned before, and some relavant issues, like
> > hard for programmers to use, as mentioned in mail-archive, should be paid
> > attention to. I think we can move small step first, that is, provide
> > interface like
> >
> > zoo_multi_test_and_set(List<int> versions, List<string> znodes, byte[]
> > data, string znode)
> >
> >
> > The API test versions of several different znodes before set one znode,
> and
> > if the client want to set other znode, it can call this API repeatedly.
> > Because we only set one node by this API, the result will be straight,
> > success or failure. We need not take care of the half-success result.
> >
> > How do ur guys think about this API?
> >
>



-- 
With Regards!

Ye, Qian

Re: need for more conditional write support

Posted by Qian Ye <ye...@gmail.com>.
I think the second is easier to implement because it only updates data on
one node, and need not handle rollback in the case of some update failure
after some update success.

Well, I agree that the API
zoo_multi_test_and_set(List<int> versions, List<string> znodes_test,
List<byte[]> data, List<string> znodes_set);

is more powerful. If we can decide to do this, I think we could start to
discuss some implement details.


On Wed, Dec 15, 2010 at 11:50 PM, Ted Dunning <te...@gmail.com> wrote:

> Well, I would just call the first method set.
>
> And I think that the second method is no easier to implement and probably a
> bit less useful.
>
> The idea that the second might be almost as useful as the first is
> interesting however.  It probably
> means that we should allow some of the data elements to be null or
> something
> to allow for testing
> versions but not setting data.
>
> On Tue, Dec 14, 2010 at 11:21 PM, Qian Ye <ye...@gmail.com> wrote:
>
> > zoo_multi_test_and_set(List<int> versions, List<string> znodes,
> > List<byte[]> data)
> >
> > can solve the problem I mentioned before, and some relavant issues, like
> > hard for programmers to use, as mentioned in mail-archive, should be paid
> > attention to. I think we can move small step first, that is, provide
> > interface like
> >
> > zoo_multi_test_and_set(List<int> versions, List<string> znodes, byte[]
> > data, string znode)
> >
> >
> > The API test versions of several different znodes before set one znode,
> and
> > if the client want to set other znode, it can call this API repeatedly.
> > Because we only set one node by this API, the result will be straight,
> > success or failure. We need not take care of the half-success result.
> >
> > How do ur guys think about this API?
> >
>



-- 
With Regards!

Ye, Qian

Re: need for more conditional write support

Posted by Ted Dunning <te...@gmail.com>.
I don't like limiting the operations at the top level.  Here is an
alternative:

    zk.multiOp(List<Operation> operations)

Where we have a polymorphic operation:

public class Operation {
  private static String path;

  public static class Check {
    public Check(String path, int version) {
    }
  }

  public static class Create {
    public Create(String path, byte[] data) {
    }
  }

  public static class Update {
    public Update(String path, byte[] date, int version) {
    }
  }

  public static class Delete {
    public Delete(String path, int version) {
    }
  }
}


On Tue, Dec 21, 2010 at 5:44 PM, Jared Cantwell <ja...@gmail.com>wrote:

> I think this is the interface Ted eluded to, but I wanted to throw it out
> there more concretely:
>
> zoo_multi_test_and_set(List<string> znodesToTest, List<int>
> versions, List<string> znodesToModify, List<OperationType>
> typeOfModifications, List<byte[]> data)
>
> You can still keep it a subset by requiring that creations explicitly set
> the test version to -1.  This type of interface would really open up the
> capability of multi_test_and_set().  I have no idea how much more
> complicated it would be to implement, but I can imagine each operation type
> having Test() and Commit() operations.  I think that by limiting it to only
> updates from the start we may do work that will need reversed to make this
> extension on the interface functional.
>
> ~Jared
>
> On Tue, Dec 21, 2010 at 4:48 PM, Ted Dunning <te...@gmail.com>
> wrote:
>
> > On Tue, Dec 21, 2010 at 1:30 PM, Henry Robinson <he...@cloudera.com>
> > wrote:
> >
> > > This is a more complicated requirement, and not something that can be
> > done
> > > right now with Zookeeper even in a single operation on a single znode.
> > The
> > > only conditional operations are updates.
> >
> >
> > Actually, creates and deletes are conditional also.  Creates are
> > conditional
> > on the file
> > not existing previously and deletes can accept a version number.
> >
> >
> > > Designing an API to support three
> > > different kinds of operation would also be more complicated, and the
> > > implementation would be trickier.
> > >
> >
> > This is definitely true.  Conceptually it is the same thing, but
> > practically
> > building a usable API is
> > a bit tricky.  Using a builder style would work, though.  For example:
> >
> >       Zookeeper zk = ...;
> >
> >
> >       zk.transaction()
> >                .create("/foobar")
> >                .update("/pig", data, version)
> >                .delete("/dog"", otherVersion)
> >                .commit()
> >
> > Another option would be to pass a list of operations which can be one of
> > Create, Update or delete.
> >
>

Re: need for more conditional write support

Posted by Jared Cantwell <ja...@gmail.com>.
I think this is the interface Ted eluded to, but I wanted to throw it out
there more concretely:

zoo_multi_test_and_set(List<string> znodesToTest, List<int>
versions, List<string> znodesToModify, List<OperationType>
typeOfModifications, List<byte[]> data)

You can still keep it a subset by requiring that creations explicitly set
the test version to -1.  This type of interface would really open up the
capability of multi_test_and_set().  I have no idea how much more
complicated it would be to implement, but I can imagine each operation type
having Test() and Commit() operations.  I think that by limiting it to only
updates from the start we may do work that will need reversed to make this
extension on the interface functional.

~Jared

On Tue, Dec 21, 2010 at 4:48 PM, Ted Dunning <te...@gmail.com> wrote:

> On Tue, Dec 21, 2010 at 1:30 PM, Henry Robinson <he...@cloudera.com>
> wrote:
>
> > This is a more complicated requirement, and not something that can be
> done
> > right now with Zookeeper even in a single operation on a single znode.
> The
> > only conditional operations are updates.
>
>
> Actually, creates and deletes are conditional also.  Creates are
> conditional
> on the file
> not existing previously and deletes can accept a version number.
>
>
> > Designing an API to support three
> > different kinds of operation would also be more complicated, and the
> > implementation would be trickier.
> >
>
> This is definitely true.  Conceptually it is the same thing, but
> practically
> building a usable API is
> a bit tricky.  Using a builder style would work, though.  For example:
>
>       Zookeeper zk = ...;
>
>
>       zk.transaction()
>                .create("/foobar")
>                .update("/pig", data, version)
>                .delete("/dog"", otherVersion)
>                .commit()
>
> Another option would be to pass a list of operations which can be one of
> Create, Update or delete.
>

Re: need for more conditional write support

Posted by Ted Dunning <te...@gmail.com>.
This is quite similar to my other proposal except that I used polymorphism
instead of an enum.

I would expect the wire format to look like what you specify.

On Wed, Dec 22, 2010 at 1:00 AM, Henry Robinson <he...@cloudera.com> wrote:

> A builder API is a bit of a departure from the core ZK API as it stands.
> I'd
> rather see something like:
>
> zk_multi_op(List<String> create_paths, List<String> delete_paths,
> List<Version> delete_versions, List<String> update_paths, List<String>
> update_versions, List<String> update_data);
>

Re: need for more conditional write support

Posted by Henry Robinson <he...@cloudera.com>.
On 21 December 2010 13:48, Ted Dunning <te...@gmail.com> wrote:

> On Tue, Dec 21, 2010 at 1:30 PM, Henry Robinson <he...@cloudera.com>
> wrote:
>
> > This is a more complicated requirement, and not something that can be
> done
> > right now with Zookeeper even in a single operation on a single znode.
> The
> > only conditional operations are updates.
>
>

> Actually, creates and deletes are conditional also.  Creates are
> conditional
> on the file
> not existing previously and deletes can accept a version number.
>
>
Thanks Ted - yes, I misspoke. I was thinking of conditional creates (on
inspection of some znode's version) and extrapolated to deletes.


>
> > Designing an API to support three
> > different kinds of operation would also be more complicated, and the
> > implementation would be trickier.
> >
>
> This is definitely true.  Conceptually it is the same thing, but
> practically
> building a usable API is
> a bit tricky.  Using a builder style would work, though.  For example:
>
>       Zookeeper zk = ...;
>
>
>       zk.transaction()
>                .create("/foobar")
>                .update("/pig", data, version)
>                .delete("/dog"", otherVersion)
>                .commit()
>
> Another option would be to pass a list of operations which can be one of
> Create, Update or delete.
>

A builder API is a bit of a departure from the core ZK API as it stands. I'd
rather see something like:

zk_multi_op(List<String> create_paths, List<String> delete_paths,
List<Version> delete_versions, List<String> update_paths, List<String>
update_versions, List<String> update_data);

or

struct op {
  int type; // OP_CREATE, OP_DELETE, OP_UPDATE
  String path;
  long version; // ignored for OP_CREATE, must be -1 to be ignored for
OP_DELETE | OP_UPDATE
  char *data; // null for OP_DELETE
}

int zk_multi_op(List<op> operations);

in the core API, wrapped with a builder in the Java client.

Henry




-- 
Henry Robinson
Software Engineer
Cloudera
415-994-6679

Re: need for more conditional write support

Posted by Ted Dunning <te...@gmail.com>.
On Tue, Dec 21, 2010 at 1:30 PM, Henry Robinson <he...@cloudera.com> wrote:

> This is a more complicated requirement, and not something that can be done
> right now with Zookeeper even in a single operation on a single znode. The
> only conditional operations are updates.


Actually, creates and deletes are conditional also.  Creates are conditional
on the file
not existing previously and deletes can accept a version number.


> Designing an API to support three
> different kinds of operation would also be more complicated, and the
> implementation would be trickier.
>

This is definitely true.  Conceptually it is the same thing, but practically
building a usable API is
a bit tricky.  Using a builder style would work, though.  For example:

       Zookeeper zk = ...;


       zk.transaction()
                .create("/foobar")
                .update("/pig", data, version)
                .delete("/dog"", otherVersion)
                .commit()

Another option would be to pass a list of operations which can be one of
Create, Update or delete.

Re: need for more conditional write support

Posted by Henry Robinson <he...@cloudera.com>.
This is a more complicated requirement, and not something that can be done
right now with Zookeeper even in a single operation on a single znode. The
only conditional operations are updates. Designing an API to support three
different kinds of operation would also be more complicated, and the
implementation would be trickier.

Not to say that it can't be done, or wouldn't be useful, but I suggest
getting the bulk test-and-set API done first: we're near consensus on how
the API should look and the implementation will not be too complex. We can
then revisit conditional create and delete both for the single operation and
the bulk operation cases.

Henry

On 21 December 2010 13:16, Jared Cantwell <ja...@gmail.com> wrote:

> Based on the discussion, it doesn't look like create/delete actions would
> be
> considered under this model.  How difficult would it be to extend the api
> to
> allow creation/deletion of nodes?  I think the hardest part would be to
> verify the 'correctness' of the update.  Are there other complications?
>
> ~Jared
>
> On Tue, Dec 21, 2010 at 1:57 AM, Benjamin Reed <br...@yahoo-inc.com>
> wrote:
>
> > keeping the aggregate size to the normal max i think helps things a lot.
> we
> > don't have to worry about a big update slowing everything down.
> >
> > to implement this we probably need to add a new request and a new
> > transaction. then you will get the atomic update property that you are
> > looking for and you will not need to worry about special queue
> management.
> >
> > ben
> >
> >
> > On 12/20/2010 10:08 PM, Ted Dunning wrote:
> >
> >> On Mon, Dec 20, 2010 at 9:24 PM, Benjamin Reed<br...@yahoo-inc.com>
> >>  wrote:
> >>
> >>  are you guys going to put a limit on the size of the updates? can
> someone
> >>> do an update over 50 znodes where data value is 500K, for example?
> >>>
> >>>  Yes.  My plan is to put a limit on the aggregate size of all of the
> >> updates
> >> that is equal to the limit that gets put on a single update normally.
> >>
> >>
> >>  if there is a failure during the update, is it okay for just a subset
> of
> >>> the znodes to be updated?
> >>>
> >>>  That would be an unpleasant alternative.
> >>
> >> My thought was to convert all of the updates to idempotent form and add
> >> them
> >> all to the queue or fail all the updates.
> >>
> >> My hope was that there would be some way to mark the batch in the queue
> so
> >> that they stay together when commits are pushed out to the cluster.  It
> >> might be necessary to flush the queue before inserting the batched
> >> updates.
> >>  Presumably something like this needs to be done now (if queue + current
> >> transaction is too large, flush queue first).
> >>
> >> Are there failure modes that would leave part of the queue committed and
> >> part not?
> >>
> >
> >
>



-- 
Henry Robinson
Software Engineer
Cloudera
415-994-6679

Re: need for more conditional write support

Posted by Jared Cantwell <ja...@gmail.com>.
Based on the discussion, it doesn't look like create/delete actions would be
considered under this model.  How difficult would it be to extend the api to
allow creation/deletion of nodes?  I think the hardest part would be to
verify the 'correctness' of the update.  Are there other complications?

~Jared

On Tue, Dec 21, 2010 at 1:57 AM, Benjamin Reed <br...@yahoo-inc.com> wrote:

> keeping the aggregate size to the normal max i think helps things a lot. we
> don't have to worry about a big update slowing everything down.
>
> to implement this we probably need to add a new request and a new
> transaction. then you will get the atomic update property that you are
> looking for and you will not need to worry about special queue management.
>
> ben
>
>
> On 12/20/2010 10:08 PM, Ted Dunning wrote:
>
>> On Mon, Dec 20, 2010 at 9:24 PM, Benjamin Reed<br...@yahoo-inc.com>
>>  wrote:
>>
>>  are you guys going to put a limit on the size of the updates? can someone
>>> do an update over 50 znodes where data value is 500K, for example?
>>>
>>>  Yes.  My plan is to put a limit on the aggregate size of all of the
>> updates
>> that is equal to the limit that gets put on a single update normally.
>>
>>
>>  if there is a failure during the update, is it okay for just a subset of
>>> the znodes to be updated?
>>>
>>>  That would be an unpleasant alternative.
>>
>> My thought was to convert all of the updates to idempotent form and add
>> them
>> all to the queue or fail all the updates.
>>
>> My hope was that there would be some way to mark the batch in the queue so
>> that they stay together when commits are pushed out to the cluster.  It
>> might be necessary to flush the queue before inserting the batched
>> updates.
>>  Presumably something like this needs to be done now (if queue + current
>> transaction is too large, flush queue first).
>>
>> Are there failure modes that would leave part of the queue committed and
>> part not?
>>
>
>

Re: need for more conditional write support

Posted by Benjamin Reed <br...@yahoo-inc.com>.
keeping the aggregate size to the normal max i think helps things a lot. 
we don't have to worry about a big update slowing everything down.

to implement this we probably need to add a new request and a new 
transaction. then you will get the atomic update property that you are 
looking for and you will not need to worry about special queue management.

ben

On 12/20/2010 10:08 PM, Ted Dunning wrote:
> On Mon, Dec 20, 2010 at 9:24 PM, Benjamin Reed<br...@yahoo-inc.com>  wrote:
>
>> are you guys going to put a limit on the size of the updates? can someone
>> do an update over 50 znodes where data value is 500K, for example?
>>
> Yes.  My plan is to put a limit on the aggregate size of all of the updates
> that is equal to the limit that gets put on a single update normally.
>
>
>> if there is a failure during the update, is it okay for just a subset of
>> the znodes to be updated?
>>
> That would be an unpleasant alternative.
>
> My thought was to convert all of the updates to idempotent form and add them
> all to the queue or fail all the updates.
>
> My hope was that there would be some way to mark the batch in the queue so
> that they stay together when commits are pushed out to the cluster.  It
> might be necessary to flush the queue before inserting the batched updates.
>   Presumably something like this needs to be done now (if queue + current
> transaction is too large, flush queue first).
>
> Are there failure modes that would leave part of the queue committed and
> part not?


Re: need for more conditional write support

Posted by Ted Dunning <te...@gmail.com>.
On Mon, Dec 20, 2010 at 9:24 PM, Benjamin Reed <br...@yahoo-inc.com> wrote:

> are you guys going to put a limit on the size of the updates? can someone
> do an update over 50 znodes where data value is 500K, for example?
>

Yes.  My plan is to put a limit on the aggregate size of all of the updates
that is equal to the limit that gets put on a single update normally.


> if there is a failure during the update, is it okay for just a subset of
> the znodes to be updated?
>

That would be an unpleasant alternative.

My thought was to convert all of the updates to idempotent form and add them
all to the queue or fail all the updates.

My hope was that there would be some way to mark the batch in the queue so
that they stay together when commits are pushed out to the cluster.  It
might be necessary to flush the queue before inserting the batched updates.
 Presumably something like this needs to be done now (if queue + current
transaction is too large, flush queue first).

Are there failure modes that would leave part of the queue committed and
part not?

Re: need for more conditional write support

Posted by Benjamin Reed <br...@yahoo-inc.com>.
are you guys going to put a limit on the size of the updates? can 
someone do an update over 50 znodes where data value is 500K, for example?

if there is a failure during the update, is it okay for just a subset of 
the znodes to be updated?

ben

On 12/20/2010 06:56 PM, Qian Ye wrote:
> Hi all, have we reached any consensus on this issue? What's our next step
> about it?
> I'm looking forward to make use of this kind of feature.
>
> thanks~
>
> On Fri, Dec 17, 2010 at 3:25 AM, Ted Dunning<te...@gmail.com>  wrote:
>
>> One alternative is to simply specify -1 in the version list to avoid the
>> version check for that one item.  That would allow
>> the subset constraint to be retained as a valid semantic check for most
>> situations and would allow
>> a very explicit way to describe when you want to violate that constraint.
>>
>> On Thu, Dec 16, 2010 at 10:06 AM, Dave Wright<wr...@gmail.com>  wrote:
>>
>>> I'm not sure why (other than your syntax) you would require the second
>>> list (to update) to be a subset of the first (to test). There are
>>> plenty of situations where you may want to update one node based on
>>> the value of another (and test that the value hasn't changed before
>>> updating) but don't really care about the second node, and it would
>>> just be extra overhead to check it's current value. In fact, I think
>>> that was the OP's situation.
>>>
>>> -Dave
>>>
>>> On Thu, Dec 16, 2010 at 1:01 PM, Ted Dunning<te...@gmail.com>
>>> wrote:
>>>> Yes.  This is isomorphic to my suggestion to allow null data.  We
>> should
>>>> toss around many options to figure out which is the most congenial
>> idiom.
>>>>   Yours is nice since it has two sets of parallel lists.
>>>>
>>>> In java with optional arguments it would be possible to use a builder
>>> style
>>>> with optional arguments:
>>>>
>>>>                zk.testVersions(node1, version1, node2, version2, ...)
>>>>                        .updateData(node1, data1, node3, data3, ...)
>>>>
>>>> I would tend to make it part of the contract that the nodes in the
>> second
>>>> part be a subset of of the nodes in the first part.  The first method
>>> would
>>>> create an object packaging up the first set of args and the second
>> method
>>>> would do the work.  Of course, this is just syntactic sugar for the
>> more
>>>> list oriented version.
>>>>
>>>> On Thu, Dec 16, 2010 at 8:16 AM, Dave Wright<wr...@gmail.com>
>> wrote:
>>>>> My recommendation would actually be a combination of the two which
>>>>> offers the most flexibility:
>>>>>
>>>>> zoo_multi_test_and_set(List<string>  znodesToTest, List<int>  versions,
>>>>> List<string>  znodesToSet, List<byte[]>  data)
>>>>>
>>>>> ...this specifies a list of nodes&  versions to check, and if the
>>>>> versions match, a list of nodes to set and the associated data.
>>>>> This allows multiple scenarios, including setting nodes other than the
>>>>> ones you are version checking, setting more nodes than you version
>>>>> check, checking more nodes than you set, etc.
>>>>> I don't think the implementation would be any harder than either of
>> the
>>>>> others.
>>>>>
>>>>> -Dave
>>>>>
>>>>>
>>>>> On Wed, Dec 15, 2010 at 10:50 AM, Ted Dunning<te...@gmail.com>
>>>>> wrote:
>>>>>> Well, I would just call the first method set.
>>>>>>
>>>>>> And I think that the second method is no easier to implement and
>>> probably
>>>>> a
>>>>>> bit less useful.
>>>>>>
>>>>>> The idea that the second might be almost as useful as the first is
>>>>>> interesting however.  It probably
>>>>>> means that we should allow some of the data elements to be null or
>>>>> something
>>>>>> to allow for testing
>>>>>> versions but not setting data.
>>>>>>
>>>>>> On Tue, Dec 14, 2010 at 11:21 PM, Qian Ye<ye...@gmail.com>
>>> wrote:
>>>>>>> zoo_multi_test_and_set(List<int>  versions, List<string>  znodes,
>>>>>>> List<byte[]>  data)
>>>>>>>
>>>>>>> can solve the problem I mentioned before, and some relavant issues,
>>> like
>>>>>>> hard for programmers to use, as mentioned in mail-archive, should
>> be
>>>>> paid
>>>>>>> attention to. I think we can move small step first, that is,
>> provide
>>>>>>> interface like
>>>>>>>
>>>>>>> zoo_multi_test_and_set(List<int>  versions, List<string>  znodes,
>>> byte[]
>>>>>>> data, string znode)
>>>>>>>
>>>>>>>
>>>>>>> The API test versions of several different znodes before set one
>>> znode,
>>>>> and
>>>>>>> if the client want to set other znode, it can call this API
>>> repeatedly.
>>>>>>> Because we only set one node by this API, the result will be
>>> straight,
>>>>>>> success or failure. We need not take care of the half-success
>> result.
>>>>>>> How do ur guys think about this API?
>>>>>>>
>
>


Re: need for more conditional write support

Posted by Benjamin Reed <br...@yahoo-inc.com>.
It all happens in PrepRequestProcessor.

ben

On 12/20/2010 07:22 PM, Ted Dunning wrote:
> Ben,
>
> Can you point me to the place where the queue is inspected to convert
> updates into idempotent writes?
>
> On Mon, Dec 20, 2010 at 6:56 PM, Qian Ye<ye...@gmail.com>  wrote:
>
>> Hi all, have we reached any consensus on this issue? What's our next step
>> about it?
>> I'm looking forward to make use of this kind of feature.
>>
>> thanks~
>>
>> On Fri, Dec 17, 2010 at 3:25 AM, Ted Dunning<te...@gmail.com>
>> wrote:
>>
>>> One alternative is to simply specify -1 in the version list to avoid the
>>> version check for that one item.  That would allow
>>> the subset constraint to be retained as a valid semantic check for most
>>> situations and would allow
>>> a very explicit way to describe when you want to violate that constraint.
>>>
>>> On Thu, Dec 16, 2010 at 10:06 AM, Dave Wright<wr...@gmail.com>  wrote:
>>>
>>>> I'm not sure why (other than your syntax) you would require the second
>>>> list (to update) to be a subset of the first (to test). There are
>>>> plenty of situations where you may want to update one node based on
>>>> the value of another (and test that the value hasn't changed before
>>>> updating) but don't really care about the second node, and it would
>>>> just be extra overhead to check it's current value. In fact, I think
>>>> that was the OP's situation.
>>>>
>>>> -Dave
>>>>
>>>> On Thu, Dec 16, 2010 at 1:01 PM, Ted Dunning<te...@gmail.com>
>>>> wrote:
>>>>> Yes.  This is isomorphic to my suggestion to allow null data.  We
>>> should
>>>>> toss around many options to figure out which is the most congenial
>>> idiom.
>>>>>   Yours is nice since it has two sets of parallel lists.
>>>>>
>>>>> In java with optional arguments it would be possible to use a builder
>>>> style
>>>>> with optional arguments:
>>>>>
>>>>>                zk.testVersions(node1, version1, node2, version2, ...)
>>>>>                        .updateData(node1, data1, node3, data3, ...)
>>>>>
>>>>> I would tend to make it part of the contract that the nodes in the
>>> second
>>>>> part be a subset of of the nodes in the first part.  The first method
>>>> would
>>>>> create an object packaging up the first set of args and the second
>>> method
>>>>> would do the work.  Of course, this is just syntactic sugar for the
>>> more
>>>>> list oriented version.
>>>>>
>>>>> On Thu, Dec 16, 2010 at 8:16 AM, Dave Wright<wr...@gmail.com>
>>> wrote:
>>>>>> My recommendation would actually be a combination of the two which
>>>>>> offers the most flexibility:
>>>>>>
>>>>>> zoo_multi_test_and_set(List<string>  znodesToTest, List<int>
>> versions,
>>>>>> List<string>  znodesToSet, List<byte[]>  data)
>>>>>>
>>>>>> ...this specifies a list of nodes&  versions to check, and if the
>>>>>> versions match, a list of nodes to set and the associated data.
>>>>>> This allows multiple scenarios, including setting nodes other than
>> the
>>>>>> ones you are version checking, setting more nodes than you version
>>>>>> check, checking more nodes than you set, etc.
>>>>>> I don't think the implementation would be any harder than either of
>>> the
>>>>>> others.
>>>>>>
>>>>>> -Dave
>>>>>>
>>>>>>
>>>>>> On Wed, Dec 15, 2010 at 10:50 AM, Ted Dunning<
>> ted.dunning@gmail.com>
>>>>>> wrote:
>>>>>>> Well, I would just call the first method set.
>>>>>>>
>>>>>>> And I think that the second method is no easier to implement and
>>>> probably
>>>>>> a
>>>>>>> bit less useful.
>>>>>>>
>>>>>>> The idea that the second might be almost as useful as the first is
>>>>>>> interesting however.  It probably
>>>>>>> means that we should allow some of the data elements to be null or
>>>>>> something
>>>>>>> to allow for testing
>>>>>>> versions but not setting data.
>>>>>>>
>>>>>>> On Tue, Dec 14, 2010 at 11:21 PM, Qian Ye<ye...@gmail.com>
>>>> wrote:
>>>>>>>> zoo_multi_test_and_set(List<int>  versions, List<string>  znodes,
>>>>>>>> List<byte[]>  data)
>>>>>>>>
>>>>>>>> can solve the problem I mentioned before, and some relavant
>> issues,
>>>> like
>>>>>>>> hard for programmers to use, as mentioned in mail-archive, should
>>> be
>>>>>> paid
>>>>>>>> attention to. I think we can move small step first, that is,
>>> provide
>>>>>>>> interface like
>>>>>>>>
>>>>>>>> zoo_multi_test_and_set(List<int>  versions, List<string>  znodes,
>>>> byte[]
>>>>>>>> data, string znode)
>>>>>>>>
>>>>>>>>
>>>>>>>> The API test versions of several different znodes before set one
>>>> znode,
>>>>>> and
>>>>>>>> if the client want to set other znode, it can call this API
>>>> repeatedly.
>>>>>>>> Because we only set one node by this API, the result will be
>>>> straight,
>>>>>>>> success or failure. We need not take care of the half-success
>>> result.
>>>>>>>> How do ur guys think about this API?
>>>>>>>>
>>
>>
>> --
>> With Regards!
>>
>> Ye, Qian
>>


Re: need for more conditional write support

Posted by Ted Dunning <te...@gmail.com>.
Ben,

Can you point me to the place where the queue is inspected to convert
updates into idempotent writes?

On Mon, Dec 20, 2010 at 6:56 PM, Qian Ye <ye...@gmail.com> wrote:

> Hi all, have we reached any consensus on this issue? What's our next step
> about it?
> I'm looking forward to make use of this kind of feature.
>
> thanks~
>
> On Fri, Dec 17, 2010 at 3:25 AM, Ted Dunning <te...@gmail.com>
> wrote:
>
> > One alternative is to simply specify -1 in the version list to avoid the
> > version check for that one item.  That would allow
> > the subset constraint to be retained as a valid semantic check for most
> > situations and would allow
> > a very explicit way to describe when you want to violate that constraint.
> >
> > On Thu, Dec 16, 2010 at 10:06 AM, Dave Wright <wr...@gmail.com> wrote:
> >
> > > I'm not sure why (other than your syntax) you would require the second
> > > list (to update) to be a subset of the first (to test). There are
> > > plenty of situations where you may want to update one node based on
> > > the value of another (and test that the value hasn't changed before
> > > updating) but don't really care about the second node, and it would
> > > just be extra overhead to check it's current value. In fact, I think
> > > that was the OP's situation.
> > >
> > > -Dave
> > >
> > > On Thu, Dec 16, 2010 at 1:01 PM, Ted Dunning <te...@gmail.com>
> > > wrote:
> > > > Yes.  This is isomorphic to my suggestion to allow null data.  We
> > should
> > > > toss around many options to figure out which is the most congenial
> > idiom.
> > > >  Yours is nice since it has two sets of parallel lists.
> > > >
> > > > In java with optional arguments it would be possible to use a builder
> > > style
> > > > with optional arguments:
> > > >
> > > >               zk.testVersions(node1, version1, node2, version2, ...)
> > > >                       .updateData(node1, data1, node3, data3, ...)
> > > >
> > > > I would tend to make it part of the contract that the nodes in the
> > second
> > > > part be a subset of of the nodes in the first part.  The first method
> > > would
> > > > create an object packaging up the first set of args and the second
> > method
> > > > would do the work.  Of course, this is just syntactic sugar for the
> > more
> > > > list oriented version.
> > > >
> > > > On Thu, Dec 16, 2010 at 8:16 AM, Dave Wright <wr...@gmail.com>
> > wrote:
> > > >
> > > >> My recommendation would actually be a combination of the two which
> > > >> offers the most flexibility:
> > > >>
> > > >> zoo_multi_test_and_set(List<string> znodesToTest, List<int>
> versions,
> > > >> List<string> znodesToSet, List<byte[]> data)
> > > >>
> > > >> ...this specifies a list of nodes & versions to check, and if the
> > > >> versions match, a list of nodes to set and the associated data.
> > > >> This allows multiple scenarios, including setting nodes other than
> the
> > > >> ones you are version checking, setting more nodes than you version
> > > >> check, checking more nodes than you set, etc.
> > > >> I don't think the implementation would be any harder than either of
> > the
> > > >> others.
> > > >>
> > > >> -Dave
> > > >>
> > > >>
> > > >> On Wed, Dec 15, 2010 at 10:50 AM, Ted Dunning <
> ted.dunning@gmail.com>
> > > >> wrote:
> > > >> > Well, I would just call the first method set.
> > > >> >
> > > >> > And I think that the second method is no easier to implement and
> > > probably
> > > >> a
> > > >> > bit less useful.
> > > >> >
> > > >> > The idea that the second might be almost as useful as the first is
> > > >> > interesting however.  It probably
> > > >> > means that we should allow some of the data elements to be null or
> > > >> something
> > > >> > to allow for testing
> > > >> > versions but not setting data.
> > > >> >
> > > >> > On Tue, Dec 14, 2010 at 11:21 PM, Qian Ye <ye...@gmail.com>
> > > wrote:
> > > >> >
> > > >> >> zoo_multi_test_and_set(List<int> versions, List<string> znodes,
> > > >> >> List<byte[]> data)
> > > >> >>
> > > >> >> can solve the problem I mentioned before, and some relavant
> issues,
> > > like
> > > >> >> hard for programmers to use, as mentioned in mail-archive, should
> > be
> > > >> paid
> > > >> >> attention to. I think we can move small step first, that is,
> > provide
> > > >> >> interface like
> > > >> >>
> > > >> >> zoo_multi_test_and_set(List<int> versions, List<string> znodes,
> > > byte[]
> > > >> >> data, string znode)
> > > >> >>
> > > >> >>
> > > >> >> The API test versions of several different znodes before set one
> > > znode,
> > > >> and
> > > >> >> if the client want to set other znode, it can call this API
> > > repeatedly.
> > > >> >> Because we only set one node by this API, the result will be
> > > straight,
> > > >> >> success or failure. We need not take care of the half-success
> > result.
> > > >> >>
> > > >> >> How do ur guys think about this API?
> > > >> >>
> > > >> >
> > > >>
> > > >
> > >
> >
>
>
>
> --
> With Regards!
>
> Ye, Qian
>

Re: need for more conditional write support

Posted by Qian Ye <ye...@gmail.com>.
Hi all, have we reached any consensus on this issue? What's our next step
about it?
I'm looking forward to make use of this kind of feature.

thanks~

On Fri, Dec 17, 2010 at 3:25 AM, Ted Dunning <te...@gmail.com> wrote:

> One alternative is to simply specify -1 in the version list to avoid the
> version check for that one item.  That would allow
> the subset constraint to be retained as a valid semantic check for most
> situations and would allow
> a very explicit way to describe when you want to violate that constraint.
>
> On Thu, Dec 16, 2010 at 10:06 AM, Dave Wright <wr...@gmail.com> wrote:
>
> > I'm not sure why (other than your syntax) you would require the second
> > list (to update) to be a subset of the first (to test). There are
> > plenty of situations where you may want to update one node based on
> > the value of another (and test that the value hasn't changed before
> > updating) but don't really care about the second node, and it would
> > just be extra overhead to check it's current value. In fact, I think
> > that was the OP's situation.
> >
> > -Dave
> >
> > On Thu, Dec 16, 2010 at 1:01 PM, Ted Dunning <te...@gmail.com>
> > wrote:
> > > Yes.  This is isomorphic to my suggestion to allow null data.  We
> should
> > > toss around many options to figure out which is the most congenial
> idiom.
> > >  Yours is nice since it has two sets of parallel lists.
> > >
> > > In java with optional arguments it would be possible to use a builder
> > style
> > > with optional arguments:
> > >
> > >               zk.testVersions(node1, version1, node2, version2, ...)
> > >                       .updateData(node1, data1, node3, data3, ...)
> > >
> > > I would tend to make it part of the contract that the nodes in the
> second
> > > part be a subset of of the nodes in the first part.  The first method
> > would
> > > create an object packaging up the first set of args and the second
> method
> > > would do the work.  Of course, this is just syntactic sugar for the
> more
> > > list oriented version.
> > >
> > > On Thu, Dec 16, 2010 at 8:16 AM, Dave Wright <wr...@gmail.com>
> wrote:
> > >
> > >> My recommendation would actually be a combination of the two which
> > >> offers the most flexibility:
> > >>
> > >> zoo_multi_test_and_set(List<string> znodesToTest, List<int> versions,
> > >> List<string> znodesToSet, List<byte[]> data)
> > >>
> > >> ...this specifies a list of nodes & versions to check, and if the
> > >> versions match, a list of nodes to set and the associated data.
> > >> This allows multiple scenarios, including setting nodes other than the
> > >> ones you are version checking, setting more nodes than you version
> > >> check, checking more nodes than you set, etc.
> > >> I don't think the implementation would be any harder than either of
> the
> > >> others.
> > >>
> > >> -Dave
> > >>
> > >>
> > >> On Wed, Dec 15, 2010 at 10:50 AM, Ted Dunning <te...@gmail.com>
> > >> wrote:
> > >> > Well, I would just call the first method set.
> > >> >
> > >> > And I think that the second method is no easier to implement and
> > probably
> > >> a
> > >> > bit less useful.
> > >> >
> > >> > The idea that the second might be almost as useful as the first is
> > >> > interesting however.  It probably
> > >> > means that we should allow some of the data elements to be null or
> > >> something
> > >> > to allow for testing
> > >> > versions but not setting data.
> > >> >
> > >> > On Tue, Dec 14, 2010 at 11:21 PM, Qian Ye <ye...@gmail.com>
> > wrote:
> > >> >
> > >> >> zoo_multi_test_and_set(List<int> versions, List<string> znodes,
> > >> >> List<byte[]> data)
> > >> >>
> > >> >> can solve the problem I mentioned before, and some relavant issues,
> > like
> > >> >> hard for programmers to use, as mentioned in mail-archive, should
> be
> > >> paid
> > >> >> attention to. I think we can move small step first, that is,
> provide
> > >> >> interface like
> > >> >>
> > >> >> zoo_multi_test_and_set(List<int> versions, List<string> znodes,
> > byte[]
> > >> >> data, string znode)
> > >> >>
> > >> >>
> > >> >> The API test versions of several different znodes before set one
> > znode,
> > >> and
> > >> >> if the client want to set other znode, it can call this API
> > repeatedly.
> > >> >> Because we only set one node by this API, the result will be
> > straight,
> > >> >> success or failure. We need not take care of the half-success
> result.
> > >> >>
> > >> >> How do ur guys think about this API?
> > >> >>
> > >> >
> > >>
> > >
> >
>



-- 
With Regards!

Ye, Qian

Re: need for more conditional write support

Posted by Ted Dunning <te...@gmail.com>.
One alternative is to simply specify -1 in the version list to avoid the
version check for that one item.  That would allow
the subset constraint to be retained as a valid semantic check for most
situations and would allow
a very explicit way to describe when you want to violate that constraint.

On Thu, Dec 16, 2010 at 10:06 AM, Dave Wright <wr...@gmail.com> wrote:

> I'm not sure why (other than your syntax) you would require the second
> list (to update) to be a subset of the first (to test). There are
> plenty of situations where you may want to update one node based on
> the value of another (and test that the value hasn't changed before
> updating) but don't really care about the second node, and it would
> just be extra overhead to check it's current value. In fact, I think
> that was the OP's situation.
>
> -Dave
>
> On Thu, Dec 16, 2010 at 1:01 PM, Ted Dunning <te...@gmail.com>
> wrote:
> > Yes.  This is isomorphic to my suggestion to allow null data.  We should
> > toss around many options to figure out which is the most congenial idiom.
> >  Yours is nice since it has two sets of parallel lists.
> >
> > In java with optional arguments it would be possible to use a builder
> style
> > with optional arguments:
> >
> >               zk.testVersions(node1, version1, node2, version2, ...)
> >                       .updateData(node1, data1, node3, data3, ...)
> >
> > I would tend to make it part of the contract that the nodes in the second
> > part be a subset of of the nodes in the first part.  The first method
> would
> > create an object packaging up the first set of args and the second method
> > would do the work.  Of course, this is just syntactic sugar for the more
> > list oriented version.
> >
> > On Thu, Dec 16, 2010 at 8:16 AM, Dave Wright <wr...@gmail.com> wrote:
> >
> >> My recommendation would actually be a combination of the two which
> >> offers the most flexibility:
> >>
> >> zoo_multi_test_and_set(List<string> znodesToTest, List<int> versions,
> >> List<string> znodesToSet, List<byte[]> data)
> >>
> >> ...this specifies a list of nodes & versions to check, and if the
> >> versions match, a list of nodes to set and the associated data.
> >> This allows multiple scenarios, including setting nodes other than the
> >> ones you are version checking, setting more nodes than you version
> >> check, checking more nodes than you set, etc.
> >> I don't think the implementation would be any harder than either of the
> >> others.
> >>
> >> -Dave
> >>
> >>
> >> On Wed, Dec 15, 2010 at 10:50 AM, Ted Dunning <te...@gmail.com>
> >> wrote:
> >> > Well, I would just call the first method set.
> >> >
> >> > And I think that the second method is no easier to implement and
> probably
> >> a
> >> > bit less useful.
> >> >
> >> > The idea that the second might be almost as useful as the first is
> >> > interesting however.  It probably
> >> > means that we should allow some of the data elements to be null or
> >> something
> >> > to allow for testing
> >> > versions but not setting data.
> >> >
> >> > On Tue, Dec 14, 2010 at 11:21 PM, Qian Ye <ye...@gmail.com>
> wrote:
> >> >
> >> >> zoo_multi_test_and_set(List<int> versions, List<string> znodes,
> >> >> List<byte[]> data)
> >> >>
> >> >> can solve the problem I mentioned before, and some relavant issues,
> like
> >> >> hard for programmers to use, as mentioned in mail-archive, should be
> >> paid
> >> >> attention to. I think we can move small step first, that is, provide
> >> >> interface like
> >> >>
> >> >> zoo_multi_test_and_set(List<int> versions, List<string> znodes,
> byte[]
> >> >> data, string znode)
> >> >>
> >> >>
> >> >> The API test versions of several different znodes before set one
> znode,
> >> and
> >> >> if the client want to set other znode, it can call this API
> repeatedly.
> >> >> Because we only set one node by this API, the result will be
> straight,
> >> >> success or failure. We need not take care of the half-success result.
> >> >>
> >> >> How do ur guys think about this API?
> >> >>
> >> >
> >>
> >
>

Re: need for more conditional write support

Posted by Ted Dunning <te...@gmail.com>.
In that case, you use the lower level unsugared version.

On Thu, Dec 16, 2010 at 10:39 AM, Jared Cantwell
<ja...@gmail.com>wrote:

> I think that syntactic sugar can be very limiting.  What if you have X
> children you would like to update, but don't know X until runtime?
>

Re: need for more conditional write support

Posted by Ted Dunning <te...@gmail.com>.
I oscillate on this issue every 10 minutes.  I expect that non subset
updates will be an error 80% of the time, but there is a very valid 20% of
the time where it would be lovely to have.

On Thu, Dec 16, 2010 at 10:39 AM, Jared Cantwell
<ja...@gmail.com>wrote:

>  I like
> the idea of lists that don't have to be subsets of each other, giving more
> flexibility.  I also think it would be interesting to discuss what
> additional recipes could be developed with this api.
>

Re: need for more conditional write support

Posted by Ted Dunning <te...@gmail.com>.
My thought is that this should be handled by limiting the *total* size of
all updates in one call to be limited the same way that *single* update
sizes are limited now.

On Thu, Dec 16, 2010 at 11:04 AM, Henry Robinson <he...@cloudera.com> wrote:

> This should be a cautionary note on performance, however: as there is no
> parallelism in the execution of updates (although there is plenty in the
> serialisation process) we should build a mechanism to constrain how much
> work this operation can perform, otherwise there's a danger of hurting
> throughput for all clients of a cluster.
>

Re: need for more conditional write support

Posted by Henry Robinson <he...@cloudera.com>.
I expect any of these proposals will be relatively simple to implement. My
understanding is that ZK serialises *all* accesses to the data tree, so
there's no need to worry about acquiring individual locks for each znode.
This should be a cautionary note on performance, however: as there is no
parallelism in the execution of updates (although there is plenty in the
serialisation process) we should build a mechanism to constrain how much
work this operation can perform, otherwise there's a danger of hurting
throughput for all clients of a cluster. ZK could do with namespaces, or a
cleverer locking mechanism like traditional databases use, to mitigate this
issue but that's a much larger undertaking.

Figuring out a good API for the ensemble to implement can be slightly
decoupled from the API that a client application sees. Therefore I prefer
the list parameters API, which can be wrapped in a builder API in Java if
that makes sense (these kinds of API are less natural in C, for example).

cheers,
Henry

On 16 December 2010 10:39, Jared Cantwell <ja...@gmail.com> wrote:

> I think that syntactic sugar can be very limiting.  What if you have X
> children you would like to update, but don't know X until runtime?  I like
> the idea of lists that don't have to be subsets of each other, giving more
> flexibility.  I also think it would be interesting to discuss what
> additional recipes could be developed with this api.
>
> ~Jared
>
> On Thu, Dec 16, 2010 at 1:06 PM, Dave Wright <wr...@gmail.com> wrote:
>
> > I'm not sure why (other than your syntax) you would require the second
> > list (to update) to be a subset of the first (to test). There are
> > plenty of situations where you may want to update one node based on
> > the value of another (and test that the value hasn't changed before
> > updating) but don't really care about the second node, and it would
> > just be extra overhead to check it's current value. In fact, I think
> > that was the OP's situation.
> >
> > -Dave
> >
> > On Thu, Dec 16, 2010 at 1:01 PM, Ted Dunning <te...@gmail.com>
> > wrote:
> > > Yes.  This is isomorphic to my suggestion to allow null data.  We
> should
> > > toss around many options to figure out which is the most congenial
> idiom.
> > >  Yours is nice since it has two sets of parallel lists.
> > >
> > > In java with optional arguments it would be possible to use a builder
> > style
> > > with optional arguments:
> > >
> > >               zk.testVersions(node1, version1, node2, version2, ...)
> > >                       .updateData(node1, data1, node3, data3, ...)
> > >
> > > I would tend to make it part of the contract that the nodes in the
> second
> > > part be a subset of of the nodes in the first part.  The first method
> > would
> > > create an object packaging up the first set of args and the second
> method
> > > would do the work.  Of course, this is just syntactic sugar for the
> more
> > > list oriented version.
> > >
> > > On Thu, Dec 16, 2010 at 8:16 AM, Dave Wright <wr...@gmail.com>
> wrote:
> > >
> > >> My recommendation would actually be a combination of the two which
> > >> offers the most flexibility:
> > >>
> > >> zoo_multi_test_and_set(List<string> znodesToTest, List<int> versions,
> > >> List<string> znodesToSet, List<byte[]> data)
> > >>
> > >> ...this specifies a list of nodes & versions to check, and if the
> > >> versions match, a list of nodes to set and the associated data.
> > >> This allows multiple scenarios, including setting nodes other than the
> > >> ones you are version checking, setting more nodes than you version
> > >> check, checking more nodes than you set, etc.
> > >> I don't think the implementation would be any harder than either of
> the
> > >> others.
> > >>
> > >> -Dave
> > >>
> > >>
> > >> On Wed, Dec 15, 2010 at 10:50 AM, Ted Dunning <te...@gmail.com>
> > >> wrote:
> > >> > Well, I would just call the first method set.
> > >> >
> > >> > And I think that the second method is no easier to implement and
> > probably
> > >> a
> > >> > bit less useful.
> > >> >
> > >> > The idea that the second might be almost as useful as the first is
> > >> > interesting however.  It probably
> > >> > means that we should allow some of the data elements to be null or
> > >> something
> > >> > to allow for testing
> > >> > versions but not setting data.
> > >> >
> > >> > On Tue, Dec 14, 2010 at 11:21 PM, Qian Ye <ye...@gmail.com>
> > wrote:
> > >> >
> > >> >> zoo_multi_test_and_set(List<int> versions, List<string> znodes,
> > >> >> List<byte[]> data)
> > >> >>
> > >> >> can solve the problem I mentioned before, and some relavant issues,
> > like
> > >> >> hard for programmers to use, as mentioned in mail-archive, should
> be
> > >> paid
> > >> >> attention to. I think we can move small step first, that is,
> provide
> > >> >> interface like
> > >> >>
> > >> >> zoo_multi_test_and_set(List<int> versions, List<string> znodes,
> > byte[]
> > >> >> data, string znode)
> > >> >>
> > >> >>
> > >> >> The API test versions of several different znodes before set one
> > znode,
> > >> and
> > >> >> if the client want to set other znode, it can call this API
> > repeatedly.
> > >> >> Because we only set one node by this API, the result will be
> > straight,
> > >> >> success or failure. We need not take care of the half-success
> result.
> > >> >>
> > >> >> How do ur guys think about this API?
> > >> >>
> > >> >
> > >>
> > >
> >
>



-- 
Henry Robinson
Software Engineer
Cloudera
415-994-6679

Re: need for more conditional write support

Posted by Jared Cantwell <ja...@gmail.com>.
I think that syntactic sugar can be very limiting.  What if you have X
children you would like to update, but don't know X until runtime?  I like
the idea of lists that don't have to be subsets of each other, giving more
flexibility.  I also think it would be interesting to discuss what
additional recipes could be developed with this api.

~Jared

On Thu, Dec 16, 2010 at 1:06 PM, Dave Wright <wr...@gmail.com> wrote:

> I'm not sure why (other than your syntax) you would require the second
> list (to update) to be a subset of the first (to test). There are
> plenty of situations where you may want to update one node based on
> the value of another (and test that the value hasn't changed before
> updating) but don't really care about the second node, and it would
> just be extra overhead to check it's current value. In fact, I think
> that was the OP's situation.
>
> -Dave
>
> On Thu, Dec 16, 2010 at 1:01 PM, Ted Dunning <te...@gmail.com>
> wrote:
> > Yes.  This is isomorphic to my suggestion to allow null data.  We should
> > toss around many options to figure out which is the most congenial idiom.
> >  Yours is nice since it has two sets of parallel lists.
> >
> > In java with optional arguments it would be possible to use a builder
> style
> > with optional arguments:
> >
> >               zk.testVersions(node1, version1, node2, version2, ...)
> >                       .updateData(node1, data1, node3, data3, ...)
> >
> > I would tend to make it part of the contract that the nodes in the second
> > part be a subset of of the nodes in the first part.  The first method
> would
> > create an object packaging up the first set of args and the second method
> > would do the work.  Of course, this is just syntactic sugar for the more
> > list oriented version.
> >
> > On Thu, Dec 16, 2010 at 8:16 AM, Dave Wright <wr...@gmail.com> wrote:
> >
> >> My recommendation would actually be a combination of the two which
> >> offers the most flexibility:
> >>
> >> zoo_multi_test_and_set(List<string> znodesToTest, List<int> versions,
> >> List<string> znodesToSet, List<byte[]> data)
> >>
> >> ...this specifies a list of nodes & versions to check, and if the
> >> versions match, a list of nodes to set and the associated data.
> >> This allows multiple scenarios, including setting nodes other than the
> >> ones you are version checking, setting more nodes than you version
> >> check, checking more nodes than you set, etc.
> >> I don't think the implementation would be any harder than either of the
> >> others.
> >>
> >> -Dave
> >>
> >>
> >> On Wed, Dec 15, 2010 at 10:50 AM, Ted Dunning <te...@gmail.com>
> >> wrote:
> >> > Well, I would just call the first method set.
> >> >
> >> > And I think that the second method is no easier to implement and
> probably
> >> a
> >> > bit less useful.
> >> >
> >> > The idea that the second might be almost as useful as the first is
> >> > interesting however.  It probably
> >> > means that we should allow some of the data elements to be null or
> >> something
> >> > to allow for testing
> >> > versions but not setting data.
> >> >
> >> > On Tue, Dec 14, 2010 at 11:21 PM, Qian Ye <ye...@gmail.com>
> wrote:
> >> >
> >> >> zoo_multi_test_and_set(List<int> versions, List<string> znodes,
> >> >> List<byte[]> data)
> >> >>
> >> >> can solve the problem I mentioned before, and some relavant issues,
> like
> >> >> hard for programmers to use, as mentioned in mail-archive, should be
> >> paid
> >> >> attention to. I think we can move small step first, that is, provide
> >> >> interface like
> >> >>
> >> >> zoo_multi_test_and_set(List<int> versions, List<string> znodes,
> byte[]
> >> >> data, string znode)
> >> >>
> >> >>
> >> >> The API test versions of several different znodes before set one
> znode,
> >> and
> >> >> if the client want to set other znode, it can call this API
> repeatedly.
> >> >> Because we only set one node by this API, the result will be
> straight,
> >> >> success or failure. We need not take care of the half-success result.
> >> >>
> >> >> How do ur guys think about this API?
> >> >>
> >> >
> >>
> >
>

Re: need for more conditional write support

Posted by Dave Wright <wr...@gmail.com>.
I'm not sure why (other than your syntax) you would require the second
list (to update) to be a subset of the first (to test). There are
plenty of situations where you may want to update one node based on
the value of another (and test that the value hasn't changed before
updating) but don't really care about the second node, and it would
just be extra overhead to check it's current value. In fact, I think
that was the OP's situation.

-Dave

On Thu, Dec 16, 2010 at 1:01 PM, Ted Dunning <te...@gmail.com> wrote:
> Yes.  This is isomorphic to my suggestion to allow null data.  We should
> toss around many options to figure out which is the most congenial idiom.
>  Yours is nice since it has two sets of parallel lists.
>
> In java with optional arguments it would be possible to use a builder style
> with optional arguments:
>
>               zk.testVersions(node1, version1, node2, version2, ...)
>                       .updateData(node1, data1, node3, data3, ...)
>
> I would tend to make it part of the contract that the nodes in the second
> part be a subset of of the nodes in the first part.  The first method would
> create an object packaging up the first set of args and the second method
> would do the work.  Of course, this is just syntactic sugar for the more
> list oriented version.
>
> On Thu, Dec 16, 2010 at 8:16 AM, Dave Wright <wr...@gmail.com> wrote:
>
>> My recommendation would actually be a combination of the two which
>> offers the most flexibility:
>>
>> zoo_multi_test_and_set(List<string> znodesToTest, List<int> versions,
>> List<string> znodesToSet, List<byte[]> data)
>>
>> ...this specifies a list of nodes & versions to check, and if the
>> versions match, a list of nodes to set and the associated data.
>> This allows multiple scenarios, including setting nodes other than the
>> ones you are version checking, setting more nodes than you version
>> check, checking more nodes than you set, etc.
>> I don't think the implementation would be any harder than either of the
>> others.
>>
>> -Dave
>>
>>
>> On Wed, Dec 15, 2010 at 10:50 AM, Ted Dunning <te...@gmail.com>
>> wrote:
>> > Well, I would just call the first method set.
>> >
>> > And I think that the second method is no easier to implement and probably
>> a
>> > bit less useful.
>> >
>> > The idea that the second might be almost as useful as the first is
>> > interesting however.  It probably
>> > means that we should allow some of the data elements to be null or
>> something
>> > to allow for testing
>> > versions but not setting data.
>> >
>> > On Tue, Dec 14, 2010 at 11:21 PM, Qian Ye <ye...@gmail.com> wrote:
>> >
>> >> zoo_multi_test_and_set(List<int> versions, List<string> znodes,
>> >> List<byte[]> data)
>> >>
>> >> can solve the problem I mentioned before, and some relavant issues, like
>> >> hard for programmers to use, as mentioned in mail-archive, should be
>> paid
>> >> attention to. I think we can move small step first, that is, provide
>> >> interface like
>> >>
>> >> zoo_multi_test_and_set(List<int> versions, List<string> znodes, byte[]
>> >> data, string znode)
>> >>
>> >>
>> >> The API test versions of several different znodes before set one znode,
>> and
>> >> if the client want to set other znode, it can call this API repeatedly.
>> >> Because we only set one node by this API, the result will be straight,
>> >> success or failure. We need not take care of the half-success result.
>> >>
>> >> How do ur guys think about this API?
>> >>
>> >
>>
>

Re: need for more conditional write support

Posted by Ted Dunning <te...@gmail.com>.
Yes.  This is isomorphic to my suggestion to allow null data.  We should
toss around many options to figure out which is the most congenial idiom.
 Yours is nice since it has two sets of parallel lists.

In java with optional arguments it would be possible to use a builder style
with optional arguments:

               zk.testVersions(node1, version1, node2, version2, ...)
                       .updateData(node1, data1, node3, data3, ...)

I would tend to make it part of the contract that the nodes in the second
part be a subset of of the nodes in the first part.  The first method would
create an object packaging up the first set of args and the second method
would do the work.  Of course, this is just syntactic sugar for the more
list oriented version.

On Thu, Dec 16, 2010 at 8:16 AM, Dave Wright <wr...@gmail.com> wrote:

> My recommendation would actually be a combination of the two which
> offers the most flexibility:
>
> zoo_multi_test_and_set(List<string> znodesToTest, List<int> versions,
> List<string> znodesToSet, List<byte[]> data)
>
> ...this specifies a list of nodes & versions to check, and if the
> versions match, a list of nodes to set and the associated data.
> This allows multiple scenarios, including setting nodes other than the
> ones you are version checking, setting more nodes than you version
> check, checking more nodes than you set, etc.
> I don't think the implementation would be any harder than either of the
> others.
>
> -Dave
>
>
> On Wed, Dec 15, 2010 at 10:50 AM, Ted Dunning <te...@gmail.com>
> wrote:
> > Well, I would just call the first method set.
> >
> > And I think that the second method is no easier to implement and probably
> a
> > bit less useful.
> >
> > The idea that the second might be almost as useful as the first is
> > interesting however.  It probably
> > means that we should allow some of the data elements to be null or
> something
> > to allow for testing
> > versions but not setting data.
> >
> > On Tue, Dec 14, 2010 at 11:21 PM, Qian Ye <ye...@gmail.com> wrote:
> >
> >> zoo_multi_test_and_set(List<int> versions, List<string> znodes,
> >> List<byte[]> data)
> >>
> >> can solve the problem I mentioned before, and some relavant issues, like
> >> hard for programmers to use, as mentioned in mail-archive, should be
> paid
> >> attention to. I think we can move small step first, that is, provide
> >> interface like
> >>
> >> zoo_multi_test_and_set(List<int> versions, List<string> znodes, byte[]
> >> data, string znode)
> >>
> >>
> >> The API test versions of several different znodes before set one znode,
> and
> >> if the client want to set other znode, it can call this API repeatedly.
> >> Because we only set one node by this API, the result will be straight,
> >> success or failure. We need not take care of the half-success result.
> >>
> >> How do ur guys think about this API?
> >>
> >
>

Re: need for more conditional write support

Posted by Dave Wright <wr...@gmail.com>.
My recommendation would actually be a combination of the two which
offers the most flexibility:

zoo_multi_test_and_set(List<string> znodesToTest, List<int> versions,
List<string> znodesToSet, List<byte[]> data)

...this specifies a list of nodes & versions to check, and if the
versions match, a list of nodes to set and the associated data.
This allows multiple scenarios, including setting nodes other than the
ones you are version checking, setting more nodes than you version
check, checking more nodes than you set, etc.
I don't think the implementation would be any harder than either of the others.

-Dave


On Wed, Dec 15, 2010 at 10:50 AM, Ted Dunning <te...@gmail.com> wrote:
> Well, I would just call the first method set.
>
> And I think that the second method is no easier to implement and probably a
> bit less useful.
>
> The idea that the second might be almost as useful as the first is
> interesting however.  It probably
> means that we should allow some of the data elements to be null or something
> to allow for testing
> versions but not setting data.
>
> On Tue, Dec 14, 2010 at 11:21 PM, Qian Ye <ye...@gmail.com> wrote:
>
>> zoo_multi_test_and_set(List<int> versions, List<string> znodes,
>> List<byte[]> data)
>>
>> can solve the problem I mentioned before, and some relavant issues, like
>> hard for programmers to use, as mentioned in mail-archive, should be paid
>> attention to. I think we can move small step first, that is, provide
>> interface like
>>
>> zoo_multi_test_and_set(List<int> versions, List<string> znodes, byte[]
>> data, string znode)
>>
>>
>> The API test versions of several different znodes before set one znode, and
>> if the client want to set other znode, it can call this API repeatedly.
>> Because we only set one node by this API, the result will be straight,
>> success or failure. We need not take care of the half-success result.
>>
>> How do ur guys think about this API?
>>
>

Re: need for more conditional write support

Posted by Ted Dunning <te...@gmail.com>.
Well, I would just call the first method set.

And I think that the second method is no easier to implement and probably a
bit less useful.

The idea that the second might be almost as useful as the first is
interesting however.  It probably
means that we should allow some of the data elements to be null or something
to allow for testing
versions but not setting data.

On Tue, Dec 14, 2010 at 11:21 PM, Qian Ye <ye...@gmail.com> wrote:

> zoo_multi_test_and_set(List<int> versions, List<string> znodes,
> List<byte[]> data)
>
> can solve the problem I mentioned before, and some relavant issues, like
> hard for programmers to use, as mentioned in mail-archive, should be paid
> attention to. I think we can move small step first, that is, provide
> interface like
>
> zoo_multi_test_and_set(List<int> versions, List<string> znodes, byte[]
> data, string znode)
>
>
> The API test versions of several different znodes before set one znode, and
> if the client want to set other znode, it can call this API repeatedly.
> Because we only set one node by this API, the result will be straight,
> success or failure. We need not take care of the half-success result.
>
> How do ur guys think about this API?
>

Re: need for more conditional write support

Posted by Ted Dunning <te...@gmail.com>.
Well, I would just call the first method set.

And I think that the second method is no easier to implement and probably a
bit less useful.

The idea that the second might be almost as useful as the first is
interesting however.  It probably
means that we should allow some of the data elements to be null or something
to allow for testing
versions but not setting data.

On Tue, Dec 14, 2010 at 11:21 PM, Qian Ye <ye...@gmail.com> wrote:

> zoo_multi_test_and_set(List<int> versions, List<string> znodes,
> List<byte[]> data)
>
> can solve the problem I mentioned before, and some relavant issues, like
> hard for programmers to use, as mentioned in mail-archive, should be paid
> attention to. I think we can move small step first, that is, provide
> interface like
>
> zoo_multi_test_and_set(List<int> versions, List<string> znodes, byte[]
> data, string znode)
>
>
> The API test versions of several different znodes before set one znode, and
> if the client want to set other znode, it can call this API repeatedly.
> Because we only set one node by this API, the result will be straight,
> success or failure. We need not take care of the half-success result.
>
> How do ur guys think about this API?
>

Re: need for more conditional write support

Posted by Dave Wright <wr...@gmail.com>.
If this work were to be done, I would much rather prefer the ability
to set multiple values at the same time, as this enables a whole host
of possibilities including certain transactional updating capabilities
that aren't possible now (far too easy to get in an in-consistent
state if you fail partway through updating multiple nodes).


-Dave Wright

On Wed, Dec 15, 2010 at 2:21 AM, Qian Ye <ye...@gmail.com> wrote:
> I have read the mails on
> http://www.mail-archive.com/zookeeper-dev@hadoop.apache.org/msg08315.html,
> and here is some further thinking about this issue.
>
> The interface like
>
> zoo_multi_test_and_set(List<int> versions, List<string> znodes,
> List<byte[]> data)
>
> can solve the problem I mentioned before, and some relavant issues, like
> hard for programmers to use, as mentioned in mail-archive, should be paid
> attention to. I think we can move small step first, that is, provide
> interface like
>
> zoo_multi_test_and_set(List<int> versions, List<string> znodes, byte[]
> data, string znode)
>
>
> The API test versions of several different znodes before set one znode, and
> if the client want to set other znode, it can call this API repeatedly.
> Because we only set one node by this API, the result will be straight,
> success or failure. We need not take care of the half-success result.
>
> How do ur guys think about this API?
>
> Thanks~
>
> On Fri, Dec 10, 2010 at 11:57 PM, Ted Dunning <te...@gmail.com> wrote:
>
>> Yes.  I can imagine that being able to write several variables, specifying
>> the version of each would
>> be useful.  But not yet supported.
>>
>> On Fri, Dec 10, 2010 at 12:01 AM, Qian Ye <ye...@gmail.com> wrote:
>>
>> > What's more, I think this kind of conditional write support is simpler
>> than
>> > multiple transactions. Multiple transactions can be built with this kind
>> of
>> > support. The link is broken?
>> >
>>
>
>
>
> --
> With Regards!
>
> Ye, Qian
>

Re: need for more conditional write support

Posted by Qian Ye <ye...@gmail.com>.
I have read the mails on
http://www.mail-archive.com/zookeeper-dev@hadoop.apache.org/msg08315.html,
and here is some further thinking about this issue.

The interface like

zoo_multi_test_and_set(List<int> versions, List<string> znodes,
List<byte[]> data)

can solve the problem I mentioned before, and some relavant issues, like
hard for programmers to use, as mentioned in mail-archive, should be paid
attention to. I think we can move small step first, that is, provide
interface like

zoo_multi_test_and_set(List<int> versions, List<string> znodes, byte[]
data, string znode)


The API test versions of several different znodes before set one znode, and
if the client want to set other znode, it can call this API repeatedly.
Because we only set one node by this API, the result will be straight,
success or failure. We need not take care of the half-success result.

How do ur guys think about this API?

Thanks~

On Fri, Dec 10, 2010 at 11:57 PM, Ted Dunning <te...@gmail.com> wrote:

> Yes.  I can imagine that being able to write several variables, specifying
> the version of each would
> be useful.  But not yet supported.
>
> On Fri, Dec 10, 2010 at 12:01 AM, Qian Ye <ye...@gmail.com> wrote:
>
> > What's more, I think this kind of conditional write support is simpler
> than
> > multiple transactions. Multiple transactions can be built with this kind
> of
> > support. The link is broken?
> >
>



-- 
With Regards!

Ye, Qian

Re: need for more conditional write support

Posted by Qian Ye <ye...@gmail.com>.
I have read the mails on
http://www.mail-archive.com/zookeeper-dev@hadoop.apache.org/msg08315.html,
and here is some further thinking about this issue.

The interface like

zoo_multi_test_and_set(List<int> versions, List<string> znodes,
List<byte[]> data)

can solve the problem I mentioned before, and some relavant issues, like
hard for programmers to use, as mentioned in mail-archive, should be paid
attention to. I think we can move small step first, that is, provide
interface like

zoo_multi_test_and_set(List<int> versions, List<string> znodes, byte[]
data, string znode)


The API test versions of several different znodes before set one znode, and
if the client want to set other znode, it can call this API repeatedly.
Because we only set one node by this API, the result will be straight,
success or failure. We need not take care of the half-success result.

How do ur guys think about this API?

Thanks~

On Fri, Dec 10, 2010 at 11:57 PM, Ted Dunning <te...@gmail.com> wrote:

> Yes.  I can imagine that being able to write several variables, specifying
> the version of each would
> be useful.  But not yet supported.
>
> On Fri, Dec 10, 2010 at 12:01 AM, Qian Ye <ye...@gmail.com> wrote:
>
> > What's more, I think this kind of conditional write support is simpler
> than
> > multiple transactions. Multiple transactions can be built with this kind
> of
> > support. The link is broken?
> >
>



-- 
With Regards!

Ye, Qian

Re: need for more conditional write support

Posted by Ted Dunning <te...@gmail.com>.
Yes.  I can imagine that being able to write several variables, specifying
the version of each would
be useful.  But not yet supported.

On Fri, Dec 10, 2010 at 12:01 AM, Qian Ye <ye...@gmail.com> wrote:

> What's more, I think this kind of conditional write support is simpler than
> multiple transactions. Multiple transactions can be built with this kind of
> support. The link is broken?
>

Re: need for more conditional write support

Posted by Ted Dunning <te...@gmail.com>.
Yes.  I can imagine that being able to write several variables, specifying
the version of each would
be useful.  But not yet supported.

On Fri, Dec 10, 2010 at 12:01 AM, Qian Ye <ye...@gmail.com> wrote:

> What's more, I think this kind of conditional write support is simpler than
> multiple transactions. Multiple transactions can be built with this kind of
> support. The link is broken?
>

Re: need for more conditional write support

Posted by Qian Ye <ye...@gmail.com>.
Hi Ted:

The solution you mentioned works in some situation, but not mine. Because,
in the third step, after you checking the condition on B and C, the value on
B and C still might be modified before you update value on A. The key point
is that with the current ZK primitives, you cannot lock node A when u are
updating node B.

A possible solution based on current ZK primitives for this scenario is that
create a extra node for each data node to play as the lock. So the update on
A can be protected by this kind of lock. However, this implementation will
bring in much complexity. For example, how to prevent deadlock in some
abnormal situations.

What's more, I think this kind of conditional write support is simpler than
multiple transactions. Multiple transactions can be built with this kind of
support. The link is broken?
http://www.mail-archive.com/zookeeper-dev@hadoop.apache.org/msg08315.html


On Fri, Dec 10, 2010 at 7:33 AM, Ted Dunning <te...@gmail.com> wrote:

> Qian,
>
> Depending on your situation, you can implement something like this now with
> the ZK primitives.
>
> In particular,
>
>    - get the current version v_a of A
>    - test the values of B and C
>    - if the condition on B and C is met, update A with required version v_a
>
> You may want to retry the whole thing if you get an exception on the update
> of A.
>
> This does a safe test and set operation, but does not allow for the
> potential of atomically updating multiple znodes in one operation.  A
> special case solution to that is to put all objects that may need to be
> updated together in the same znode content.  That is clearly not a general
> solution, but it is often possible.
>
> On Thu, Dec 9, 2010 at 4:19 AM, Qian Ye <ye...@gmail.com> wrote:
>
> > Hi all:
> >
> > I'm working on a distributed system these days, and need more conditional
> > write support on Zookeeper. Now the zookeeper only support modifing,
> delete
> > or set, node data with a version number represent the current version of
> > the
> > node. I need modification on the condition of other nodes. For e.g. I
> want
> > to set the node data of /node to A, if the node data of /node1 is B and
> the
> > node data of /node2 is C. Should we support this kind of interface?
> >
> > thanks
> > --
> > With Regards!
> >
> > Ye, Qian
> >
>



-- 
With Regards!

Ye, Qian

Re: need for more conditional write support

Posted by Qian Ye <ye...@gmail.com>.
Hi Ted:

The solution you mentioned works in some situation, but not mine. Because,
in the third step, after you checking the condition on B and C, the value on
B and C still might be modified before you update value on A. The key point
is that with the current ZK primitives, you cannot lock node A when u are
updating node B.

A possible solution based on current ZK primitives for this scenario is that
create a extra node for each data node to play as the lock. So the update on
A can be protected by this kind of lock. However, this implementation will
bring in much complexity. For example, how to prevent deadlock in some
abnormal situations.

What's more, I think this kind of conditional write support is simpler than
multiple transactions. Multiple transactions can be built with this kind of
support. The link is broken?
http://www.mail-archive.com/zookeeper-dev@hadoop.apache.org/msg08315.html


On Fri, Dec 10, 2010 at 7:33 AM, Ted Dunning <te...@gmail.com> wrote:

> Qian,
>
> Depending on your situation, you can implement something like this now with
> the ZK primitives.
>
> In particular,
>
>    - get the current version v_a of A
>    - test the values of B and C
>    - if the condition on B and C is met, update A with required version v_a
>
> You may want to retry the whole thing if you get an exception on the update
> of A.
>
> This does a safe test and set operation, but does not allow for the
> potential of atomically updating multiple znodes in one operation.  A
> special case solution to that is to put all objects that may need to be
> updated together in the same znode content.  That is clearly not a general
> solution, but it is often possible.
>
> On Thu, Dec 9, 2010 at 4:19 AM, Qian Ye <ye...@gmail.com> wrote:
>
> > Hi all:
> >
> > I'm working on a distributed system these days, and need more conditional
> > write support on Zookeeper. Now the zookeeper only support modifing,
> delete
> > or set, node data with a version number represent the current version of
> > the
> > node. I need modification on the condition of other nodes. For e.g. I
> want
> > to set the node data of /node to A, if the node data of /node1 is B and
> the
> > node data of /node2 is C. Should we support this kind of interface?
> >
> > thanks
> > --
> > With Regards!
> >
> > Ye, Qian
> >
>



-- 
With Regards!

Ye, Qian

Re: need for more conditional write support

Posted by Ted Dunning <te...@gmail.com>.
Qian,

Depending on your situation, you can implement something like this now with
the ZK primitives.

In particular,

    - get the current version v_a of A
    - test the values of B and C
    - if the condition on B and C is met, update A with required version v_a

You may want to retry the whole thing if you get an exception on the update
of A.

This does a safe test and set operation, but does not allow for the
potential of atomically updating multiple znodes in one operation.  A
special case solution to that is to put all objects that may need to be
updated together in the same znode content.  That is clearly not a general
solution, but it is often possible.

On Thu, Dec 9, 2010 at 4:19 AM, Qian Ye <ye...@gmail.com> wrote:

> Hi all:
>
> I'm working on a distributed system these days, and need more conditional
> write support on Zookeeper. Now the zookeeper only support modifing, delete
> or set, node data with a version number represent the current version of
> the
> node. I need modification on the condition of other nodes. For e.g. I want
> to set the node data of /node to A, if the node data of /node1 is B and the
> node data of /node2 is C. Should we support this kind of interface?
>
> thanks
> --
> With Regards!
>
> Ye, Qian
>

Re: need for more conditional write support

Posted by Ted Dunning <te...@gmail.com>.
Qian,

Depending on your situation, you can implement something like this now with
the ZK primitives.

In particular,

    - get the current version v_a of A
    - test the values of B and C
    - if the condition on B and C is met, update A with required version v_a

You may want to retry the whole thing if you get an exception on the update
of A.

This does a safe test and set operation, but does not allow for the
potential of atomically updating multiple znodes in one operation.  A
special case solution to that is to put all objects that may need to be
updated together in the same znode content.  That is clearly not a general
solution, but it is often possible.

On Thu, Dec 9, 2010 at 4:19 AM, Qian Ye <ye...@gmail.com> wrote:

> Hi all:
>
> I'm working on a distributed system these days, and need more conditional
> write support on Zookeeper. Now the zookeeper only support modifing, delete
> or set, node data with a version number represent the current version of
> the
> node. I need modification on the condition of other nodes. For e.g. I want
> to set the node data of /node to A, if the node data of /node1 is B and the
> node data of /node2 is C. Should we support this kind of interface?
>
> thanks
> --
> With Regards!
>
> Ye, Qian
>

Re: need for more conditional write support

Posted by Mahadev Konar <ma...@yahoo-inc.com>.
Hi Qian,
   There have been discussions on multiple transaction in zookeeper:

 http://www.mail-archive.com/zookeeper-dev@hadoop.apache.org/msg08315.html

Which I think might help this case. There have been some discussions on this
lately but not much progress. You can start a discussion on the list to see
what folks feel about multiple transactions. I think if we are to support
something like what you mention it should be via multiple transactions
wherein a success would mean a success for all the transactions that are
part of a multiple transaction.

Thanks
mahadev

On 12/9/10 4:19 AM, "Qian Ye" <ye...@gmail.com> wrote:

> Hi all:
> 
> I'm working on a distributed system these days, and need more conditional
> write support on Zookeeper. Now the zookeeper only support modifing, delete
> or set, node data with a version number represent the current version of the
> node. I need modification on the condition of other nodes. For e.g. I want
> to set the node data of /node to A, if the node data of /node1 is B and the
> node data of /node2 is C. Should we support this kind of interface?
> 
> thanks
> --
> With Regards!
> 
> Ye, Qian
> 


Re: need for more conditional write support

Posted by Mahadev Konar <ma...@yahoo-inc.com>.
Hi Qian,
   There have been discussions on multiple transaction in zookeeper:

 http://www.mail-archive.com/zookeeper-dev@hadoop.apache.org/msg08315.html

Which I think might help this case. There have been some discussions on this
lately but not much progress. You can start a discussion on the list to see
what folks feel about multiple transactions. I think if we are to support
something like what you mention it should be via multiple transactions
wherein a success would mean a success for all the transactions that are
part of a multiple transaction.

Thanks
mahadev

On 12/9/10 4:19 AM, "Qian Ye" <ye...@gmail.com> wrote:

> Hi all:
> 
> I'm working on a distributed system these days, and need more conditional
> write support on Zookeeper. Now the zookeeper only support modifing, delete
> or set, node data with a version number represent the current version of the
> node. I need modification on the condition of other nodes. For e.g. I want
> to set the node data of /node to A, if the node data of /node1 is B and the
> node data of /node2 is C. Should we support this kind of interface?
> 
> thanks
> --
> With Regards!
> 
> Ye, Qian
>