You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@zookeeper.apache.org by Robin <rc...@163.com> on 2014/11/15 03:48:47 UTC

A suggestion about the design for znode version in ZooKeeper

Hi zookeepers,

When I dig into ZooKeeper's internals, I have learned the following flaw about znode version in ZooKeeper: znode's version will be reset when znode is deleted/re-created. This is a trap for some operations which make updates based on znode version. 

Let's see an example:  a client gets the data of a znode (e.g, /test)  and version(e.g, 1), change the data of the znode, and writes it back with the condition that the version does not change (still be 1). If another client deletes and re-creates this znode during the first client is updating the data, the version matches, but it now contains the wrong data.

The problem I can see is that the znode version is designed to be a monotonically increasing integer. If we can include the birth-date(timestamp) of the znode or zxid for the creation of the znode as part of the znode's version, and only the integer part of the version will increase every time when the znode is updated, while keeping the birth-date or zxid part of the version not change, we can avoid the problem.

Of course, there will be some cost for the new design: it needs bigger size for the version field.

Thanks,
- Robin 

Re: A suggestion about the design for znode version in ZooKeeper

Posted by "Jürgen Wagner (DVT)" <ju...@devoteam.com>.
It depends. Zookeeper is a rather primitive instrument. Libraries like
Curator add more complex constructs on top and makes handling them easy.
You may create your own Zookeeper abstraction library. Clients should
not have to deal with the raw Zookeeper interface, anyway.

The point is: you are making implicit assumptions about how Zookeeper
operation semantics should be, and this is currentliy not compatible
with the optimistic model of Zookeeper. Your assumptions need to be made
explicit and mapped to a transaction model that works for you and only
utilize the primitives provided by Zookeeper by not making assumptions
beyond what's supported. It's not hard, but requires a bit of an
abstraction layer. THAT will actually reduce complexity on the client side.

Best regards,
--Jürgen

On 16.11.2014 03:28, Robin wrote:
> Hi Vamsi and Jurgen,
>
>
>
> Thanks for your explanation and I believe that your suggestions will work. However, both of these solutions need extra complexity for client code. If this can be solved at ZooKeeper side with little effort, it will be great.
>
>
> Thanks,
> - Cheng Rao
> At 2014-11-15 23:38:12, "Vamsi Devaki" <de...@gmail.com> wrote:
>


Re:Re: A suggestion about the design for znode version in ZooKeeper

Posted by Robin <rc...@163.com>.
Hi Vamsi and Jurgen,



Thanks for your explanation and I believe that your suggestions will work. However, both of these solutions need extra complexity for client code. If this can be solved at ZooKeeper side with little effort, it will be great.


Thanks,
- Cheng Rao
At 2014-11-15 23:38:12, "Vamsi Devaki" <de...@gmail.com> wrote:
>Hi Robin,
>
>One way to work with the situation is to use multi / transaction API. You
>can check the version of the parent and operate on child nodes atomically.
>
>A quick explanation can be found at -
>http://tdunning.blogspot.com/2011/06/tour-of-multi-update-for-zookeeper.html
>
>Regards,
>Vamsi
>
>
>On Sat, Nov 15, 2014 at 2:00 AM, "Jürgen Wagner (DVT)" <
>juergen.wagner@devoteam.com> wrote:
>
>>  Zookeeper uses an optimistic appoach in this case. The "problem" will
>> only occur if you simply use the optimistic mode in your application as
>> well.
>>
>> So, you have to implement a pessimistic version, i.e., create a lock and
>> then perform the update or guarantee otherwise that the required operations
>> will be atomic. In that case, you can guarantee that nobody will delete the
>> node while you're busy with the update.
>>
>> Cheers,
>> --Jürgen
>>
>>
>>
>> On 15.11.2014 10:25, Ivan Kelly wrote:
>>
>> another option would be to start the znode id at the znode id of the
>> parent znode which will be different between each deletion and
>> creation of child nodes. One problem with this though (apart from
>> being limited to 2^31 bits), is that the api doesn't have any way to
>> return the initial znode version on creation. Fixing this, in a
>> backward-compatible, non-ugly way would be hard I think.
>>
>> -Ivan
>>
>> On 15 November 2014 03:48, Robin <rc...@163.com> <rc...@163.com> wrote:
>>
>>  Hi zookeepers,
>>
>> When I dig into ZooKeeper's internals, I have learned the following flaw about znode version in ZooKeeper: znode's version will be reset when znode is deleted/re-created. This is a trap for some operations which make updates based on znode version.
>>
>> Let's see an example:  a client gets the data of a znode (e.g, /test)  and version(e.g, 1), change the data of the znode, and writes it back with the condition that the version does not change (still be 1). If another client deletes and re-creates this znode during the first client is updating the data, the version matches, but it now contains the wrong data.
>>
>> The problem I can see is that the znode version is designed to be a monotonically increasing integer. If we can include the birth-date(timestamp) of the znode or zxid for the creation of the znode as part of the znode's version, and only the integer part of the version will increase every time when the znode is updated, while keeping the birth-date or zxid part of the version not change, we can avoid the problem.
>>
>> Of course, there will be some cost for the new design: it needs bigger size for the version field.
>>
>> Thanks,
>> - Robin
>>
>>
>>
>> --
>>
>> Mit freundlichen Grüßen/Kind regards/Cordialement vôtre/Atentamente/С
>> уважением
>> *i.A. Jürgen Wagner*
>> Head of Competence Center "Intelligence"
>> & Senior Cloud Consultant
>>
>> Devoteam GmbH, Industriestr. 3, 70565 Stuttgart, Germany
>> Phone: +49 6151 868-8725, Fax: +49 711 13353-53, Mobile: +49 171 864 1543
>> E-Mail: juergen.wagner@devoteam.com, URL: www.devoteam.de
>> ------------------------------
>> Managing Board: Jürgen Hatzipantelis (CEO)
>> Address of Record: 64331 Weiterstadt, Germany; Commercial Register:
>> Amtsgericht Darmstadt HRB 6450; Tax Number: DE 172 993 071
>>
>>
>>
>
>
>-- 
>Vamsi

Re: A suggestion about the design for znode version in ZooKeeper

Posted by Vamsi Devaki <de...@gmail.com>.
Hi Robin,

One way to work with the situation is to use multi / transaction API. You
can check the version of the parent and operate on child nodes atomically.

A quick explanation can be found at -
http://tdunning.blogspot.com/2011/06/tour-of-multi-update-for-zookeeper.html

Regards,
Vamsi


On Sat, Nov 15, 2014 at 2:00 AM, "Jürgen Wagner (DVT)" <
juergen.wagner@devoteam.com> wrote:

>  Zookeeper uses an optimistic appoach in this case. The "problem" will
> only occur if you simply use the optimistic mode in your application as
> well.
>
> So, you have to implement a pessimistic version, i.e., create a lock and
> then perform the update or guarantee otherwise that the required operations
> will be atomic. In that case, you can guarantee that nobody will delete the
> node while you're busy with the update.
>
> Cheers,
> --Jürgen
>
>
>
> On 15.11.2014 10:25, Ivan Kelly wrote:
>
> another option would be to start the znode id at the znode id of the
> parent znode which will be different between each deletion and
> creation of child nodes. One problem with this though (apart from
> being limited to 2^31 bits), is that the api doesn't have any way to
> return the initial znode version on creation. Fixing this, in a
> backward-compatible, non-ugly way would be hard I think.
>
> -Ivan
>
> On 15 November 2014 03:48, Robin <rc...@163.com> <rc...@163.com> wrote:
>
>  Hi zookeepers,
>
> When I dig into ZooKeeper's internals, I have learned the following flaw about znode version in ZooKeeper: znode's version will be reset when znode is deleted/re-created. This is a trap for some operations which make updates based on znode version.
>
> Let's see an example:  a client gets the data of a znode (e.g, /test)  and version(e.g, 1), change the data of the znode, and writes it back with the condition that the version does not change (still be 1). If another client deletes and re-creates this znode during the first client is updating the data, the version matches, but it now contains the wrong data.
>
> The problem I can see is that the znode version is designed to be a monotonically increasing integer. If we can include the birth-date(timestamp) of the znode or zxid for the creation of the znode as part of the znode's version, and only the integer part of the version will increase every time when the znode is updated, while keeping the birth-date or zxid part of the version not change, we can avoid the problem.
>
> Of course, there will be some cost for the new design: it needs bigger size for the version field.
>
> Thanks,
> - Robin
>
>
>
> --
>
> Mit freundlichen Grüßen/Kind regards/Cordialement vôtre/Atentamente/С
> уважением
> *i.A. Jürgen Wagner*
> Head of Competence Center "Intelligence"
> & Senior Cloud Consultant
>
> Devoteam GmbH, Industriestr. 3, 70565 Stuttgart, Germany
> Phone: +49 6151 868-8725, Fax: +49 711 13353-53, Mobile: +49 171 864 1543
> E-Mail: juergen.wagner@devoteam.com, URL: www.devoteam.de
> ------------------------------
> Managing Board: Jürgen Hatzipantelis (CEO)
> Address of Record: 64331 Weiterstadt, Germany; Commercial Register:
> Amtsgericht Darmstadt HRB 6450; Tax Number: DE 172 993 071
>
>
>


-- 
Vamsi

Re: A suggestion about the design for znode version in ZooKeeper

Posted by "Jürgen Wagner (DVT)" <ju...@devoteam.com>.
Zookeeper uses an optimistic appoach in this case. The "problem" will
only occur if you simply use the optimistic mode in your application as
well.

So, you have to implement a pessimistic version, i.e., create a lock and
then perform the update or guarantee otherwise that the required
operations will be atomic. In that case, you can guarantee that nobody
will delete the node while you're busy with the update.

Cheers,
--Jürgen


On 15.11.2014 10:25, Ivan Kelly wrote:
> another option would be to start the znode id at the znode id of the
> parent znode which will be different between each deletion and
> creation of child nodes. One problem with this though (apart from
> being limited to 2^31 bits), is that the api doesn't have any way to
> return the initial znode version on creation. Fixing this, in a
> backward-compatible, non-ugly way would be hard I think.
>
> -Ivan
>
> On 15 November 2014 03:48, Robin <rc...@163.com> wrote:
>> Hi zookeepers,
>>
>> When I dig into ZooKeeper's internals, I have learned the following flaw about znode version in ZooKeeper: znode's version will be reset when znode is deleted/re-created. This is a trap for some operations which make updates based on znode version.
>>
>> Let's see an example:  a client gets the data of a znode (e.g, /test)  and version(e.g, 1), change the data of the znode, and writes it back with the condition that the version does not change (still be 1). If another client deletes and re-creates this znode during the first client is updating the data, the version matches, but it now contains the wrong data.
>>
>> The problem I can see is that the znode version is designed to be a monotonically increasing integer. If we can include the birth-date(timestamp) of the znode or zxid for the creation of the znode as part of the znode's version, and only the integer part of the version will increase every time when the znode is updated, while keeping the birth-date or zxid part of the version not change, we can avoid the problem.
>>
>> Of course, there will be some cost for the new design: it needs bigger size for the version field.
>>
>> Thanks,
>> - Robin


-- 

Mit freundlichen Grüßen/Kind regards/Cordialement vôtre/Atentamente/С
уважением
*i.A. Jürgen Wagner*
Head of Competence Center "Intelligence"
& Senior Cloud Consultant

Devoteam GmbH, Industriestr. 3, 70565 Stuttgart, Germany
Phone: +49 6151 868-8725, Fax: +49 711 13353-53, Mobile: +49 171 864 1543
E-Mail: juergen.wagner@devoteam.com
<ma...@devoteam.com>, URL: www.devoteam.de
<http://www.devoteam.de/>

------------------------------------------------------------------------
Managing Board: Jürgen Hatzipantelis (CEO)
Address of Record: 64331 Weiterstadt, Germany; Commercial Register:
Amtsgericht Darmstadt HRB 6450; Tax Number: DE 172 993 071



Re: A suggestion about the design for znode version in ZooKeeper

Posted by Ivan Kelly <iv...@ivankelly.net>.
another option would be to start the znode id at the znode id of the
parent znode which will be different between each deletion and
creation of child nodes. One problem with this though (apart from
being limited to 2^31 bits), is that the api doesn't have any way to
return the initial znode version on creation. Fixing this, in a
backward-compatible, non-ugly way would be hard I think.

-Ivan

On 15 November 2014 03:48, Robin <rc...@163.com> wrote:
> Hi zookeepers,
>
> When I dig into ZooKeeper's internals, I have learned the following flaw about znode version in ZooKeeper: znode's version will be reset when znode is deleted/re-created. This is a trap for some operations which make updates based on znode version.
>
> Let's see an example:  a client gets the data of a znode (e.g, /test)  and version(e.g, 1), change the data of the znode, and writes it back with the condition that the version does not change (still be 1). If another client deletes and re-creates this znode during the first client is updating the data, the version matches, but it now contains the wrong data.
>
> The problem I can see is that the znode version is designed to be a monotonically increasing integer. If we can include the birth-date(timestamp) of the znode or zxid for the creation of the znode as part of the znode's version, and only the integer part of the version will increase every time when the znode is updated, while keeping the birth-date or zxid part of the version not change, we can avoid the problem.
>
> Of course, there will be some cost for the new design: it needs bigger size for the version field.
>
> Thanks,
> - Robin