You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@zookeeper.apache.org by Semih Salihoglu <se...@stanford.edu> on 2011/03/06 02:06:17 UTC

Question about the Barrier Java example on the ZooKeeper documentation

Hi All,

I am new to this group and to ZooKeeper. I was readin the Barrier tutorial
in one of the ZooKeeper documentations.
http://hadoop.apache.org/zookeeper/docs/current/zookeeperTutorial.html . A
barrier primitive is exactly how I want to use ZooKeeper. I have a question
about this example. It's not really a ZooKeeper question, it's more a
question about the Barrier primitive I think. Here it is: In the enter
method of this Barrier implementation below

boolean enter() throws KeeperException, InterruptedException{
            zk.create(root + "/" + name, new byte[0], Ids.OPEN_ACL_UNSAFE,
                    CreateMode.EPHEMERAL_SEQUENTIAL);
            while (true) {
                synchronized (mutex) {
                    List<String> list = zk.getChildren(root, true);

                    if (list.size() < size) {
                        mutex.wait();
                    } else {
                        return true;
                    }
                }
            }
        }

could there be a race condition? Let's say there are two
machines/nodes: node1 and node2 that will use this code to synchronize
over ZK. Let's say the following steps take place:


   1. node1 calls the zk.create method and then reads the number of
children, and sees that it's 1 and starts waiting.
   2. node2 calls the zk.create method (doesn't call the
zk.getChildren method yet, let's say it's very slow)
   3. node1 is notified that the number of children on the znode
changed, it checks that the size is 2 so it leaves the barrier, it
does its work and then leaves the barrier, deleting its node.
   4. node2 calls zk.getChildren and because node1 has already left,
it sees that the number of children is equal to 1. Since node1 will
never enter the barrier again, it will keep waiting.

Could this scenario happen? If not, what is preventing this? I haven't
copied the code piece that enters barrier-does work-leaves barrier.
But in the link I pasted above, it's the barrierTest(String args[])
method.

Thank you very much in advance,

semih

Re: Question about the Barrier Java example on the ZooKeeper documentation

Posted by Mahadev Konar <ma...@apache.org>.
I just added you to the contributors list and assigned the jira to you.

thanks
mahadev

On Wed, Mar 9, 2011 at 1:55 AM, Semih Salihoglu <se...@stanford.edu> wrote:

> I created a bug but I don't see a way to assign it to myself (or anyone
> actually). Here's the link:
> https://issues.apache.org/jira/browse/ZOOKEEPER-1011.
>
> semih
>
>
> On Wed, Mar 9, 2011 at 1:30 AM, Flavio Junqueira <fp...@yahoo-inc.com>wrote:
>
>> Hi Semih, Jira is the system we use to report and discuss zookeeper
>> issues:
>>
>> https://issues.apache.org/jira/browse/ZOOKEEPER
>>
>> Once you have an account, you can create a new issue, describe it, and
>> propose a fix to the problem at hand.
>>
>> -Flavio
>>
>> On Mar 8, 2011, at 10:13 PM, Semih Salihoglu wrote:
>>
>> Sure, I'll get to it this weekend probably.
>>
>> I don't know what jira is so some information of how to do this would be
>> very helpful.
>>
>> Thank you,
>>
>> semih
>>
>> On Tue, Mar 8, 2011 at 8:31 AM, Patrick Hunt <ph...@apache.org> wrote:
>>
>>> On Tue, Mar 8, 2011 at 5:59 AM, Flavio Junqueira <fp...@yahoo-inc.com>wrote:
>>>
>>>> I believe the goal of the examples was never to be a complete solutions
>>>> to barriers or queues, but just to give a quick bootstrap to beginners. It
>>>> is true, though, that the documentation page does not make that claim, and
>>>> can be misleading.
>>>>
>>>> I see two possible action points out of this discussion:
>>>> 1- State clearly in the beginning that the example discussed is not
>>>> correct under the assumption that a process may finish the computation
>>>> before another has started, and the example is there for illustration
>>>> purposes;
>>>> 2- Have another example following the current one that discusses the
>>>> problem and shows how to fix it. This is an interesting option that
>>>> illustrates how one could reason about a solution when developing with
>>>> zookeeper.
>>>>
>>>>
>>> This (2) sounds much better to me. Semih, would you like to give that a
>>> try? (updating the docs I mean)
>>>
>>> Patrick
>>>
>>>
>>>> If you are interested in helping us fix it, Semih, then you could
>>>> perhaps create a jira and assign yourself to fix it. I can help you out.
>>>>
>>>> -Flavio
>>>>
>>>> On Mar 7, 2011, at 11:23 AM, Semih Salihoglu wrote:
>>>>
>>>> Hi Mahadev,
>>>>
>>>> Sorry for the late response. I agree, actually in this other
>>>> documentation
>>>> http://hadoop.apache.org/zookeeper/docs/r3.0.0/recipes.html, where
>>>> there is
>>>> only the pseudo-code, I think this situation is avoided. Here there is
>>>> another znode /ready that all nodes have a watch on. And after each node
>>>> writes their own ephemeral child, they don't wait. They read how many of
>>>> has
>>>> been written and the last one writes the /ready znode and everyone wakes
>>>> up.
>>>> The only race condition in this one is that there can be two nodes
>>>> trying to
>>>> write /ready and only one of them will succeed but this is ok.
>>>>
>>>> Thank you again,
>>>>
>>>> semih
>>>>
>>>> On Sat, Mar 5, 2011 at 6:41 PM, Mahadev Konar <ma...@apache.org>
>>>> wrote:
>>>>
>>>> Semih,
>>>>
>>>> You pointed it out right. It is possible ot enter into a situation
>>>>
>>>> like that. The recipe does have a bug. It can be fixed with the last
>>>>
>>>> client creating a special znode and every node in the list watching
>>>>
>>>> for that (so itll be an indication for entering the barrier). no?
>>>>
>>>>
>>>> thanks
>>>>
>>>> mahadev
>>>>
>>>>
>>>> On Sat, Mar 5, 2011 at 5:06 PM, Semih Salihoglu <se...@stanford.edu>
>>>>
>>>> wrote:
>>>>
>>>> Hi All,
>>>>
>>>>
>>>> I am new to this group and to ZooKeeper. I was readin the Barrier
>>>>
>>>> tutorial
>>>>
>>>> in one of the ZooKeeper documentations.
>>>>
>>>> http://hadoop.apache.org/zookeeper/docs/current/zookeeperTutorial.html.
>>>>
>>>> A
>>>>
>>>> barrier primitive is exactly how I want to use ZooKeeper. I have a
>>>>
>>>> question
>>>>
>>>> about this example. It's not really a ZooKeeper question, it's more a
>>>>
>>>> question about the Barrier primitive I think. Here it is: In the enter
>>>>
>>>> method of this Barrier implementation below
>>>>
>>>>
>>>> boolean enter() throws KeeperException, InterruptedException{
>>>>
>>>>            zk.create(root + "/" + name, new byte[0],
>>>> Ids.OPEN_ACL_UNSAFE,
>>>>
>>>>                   CreateMode.EPHEMERAL_SEQUENTIAL);
>>>>
>>>>           while (true) {
>>>>
>>>>               synchronized (mutex) {
>>>>
>>>>                    List<String> list = zk.getChildren(root, true);
>>>>
>>>>
>>>>                    if (list.size() < size) {
>>>>
>>>>                       mutex.wait();
>>>>
>>>>                   } else {
>>>>
>>>>                       return true;
>>>>
>>>>                    }
>>>>
>>>>               }
>>>>
>>>>            }
>>>>
>>>>       }
>>>>
>>>>
>>>> could there be a race condition? Let's say there are two
>>>>
>>>> machines/nodes: node1 and node2 that will use this code to synchronize
>>>>
>>>> over ZK. Let's say the following steps take place:
>>>>
>>>>
>>>>
>>>>  1. node1 calls the zk.create method and then reads the number of
>>>>
>>>> children, and sees that it's 1 and starts waiting.
>>>>
>>>>  2. node2 calls the zk.create method (doesn't call the
>>>>
>>>> zk.getChildren method yet, let's say it's very slow)
>>>>
>>>>  3. node1 is notified that the number of children on the znode
>>>>
>>>> changed, it checks that the size is 2 so it leaves the barrier, it
>>>>
>>>> does its work and then leaves the barrier, deleting its node.
>>>>
>>>>  4. node2 calls zk.getChildren and because node1 has already left,
>>>>
>>>> it sees that the number of children is equal to 1. Since node1 will
>>>>
>>>> never enter the barrier again, it will keep waiting.
>>>>
>>>>
>>>> Could this scenario happen? If not, what is preventing this? I haven't
>>>>
>>>> copied the code piece that enters barrier-does work-leaves barrier.
>>>>
>>>> But in the link I pasted above, it's the barrierTest(String args[])
>>>>
>>>> method.
>>>>
>>>>
>>>> Thank you very much in advance,
>>>>
>>>>
>>>> semih
>>>>
>>>>
>>>>
>>>>
>>>>   *flavio*
>>>> *junqueira*
>>>>
>>>> research scientist
>>>>
>>>> fpj@yahoo-inc.com
>>>> direct +34 93-183-8828
>>>>
>>>> avinguda diagonal 177, 8th floor, barcelona, 08018, es
>>>> phone (408) 349 3300    fax (408) 349 3301
>>>>
>>>>
>>>>
>>>
>>
>>   *flavio*
>> *junqueira*
>>
>> research scientist
>>
>> fpj@yahoo-inc.com
>> direct +34 93-183-8828
>>
>> avinguda diagonal 177, 8th floor, barcelona, 08018, es
>> phone (408) 349 3300    fax (408) 349 3301
>>
>>
>>
>

Re: Question about the Barrier Java example on the ZooKeeper documentation

Posted by Semih Salihoglu <se...@stanford.edu>.
I created a bug but I don't see a way to assign it to myself (or anyone
actually). Here's the link:
https://issues.apache.org/jira/browse/ZOOKEEPER-1011.

semih

On Wed, Mar 9, 2011 at 1:30 AM, Flavio Junqueira <fp...@yahoo-inc.com> wrote:

> Hi Semih, Jira is the system we use to report and discuss zookeeper issues:
>
> https://issues.apache.org/jira/browse/ZOOKEEPER
>
> Once you have an account, you can create a new issue, describe it, and
> propose a fix to the problem at hand.
>
> -Flavio
>
> On Mar 8, 2011, at 10:13 PM, Semih Salihoglu wrote:
>
> Sure, I'll get to it this weekend probably.
>
> I don't know what jira is so some information of how to do this would be
> very helpful.
>
> Thank you,
>
> semih
>
> On Tue, Mar 8, 2011 at 8:31 AM, Patrick Hunt <ph...@apache.org> wrote:
>
>> On Tue, Mar 8, 2011 at 5:59 AM, Flavio Junqueira <fp...@yahoo-inc.com>wrote:
>>
>>> I believe the goal of the examples was never to be a complete solutions
>>> to barriers or queues, but just to give a quick bootstrap to beginners. It
>>> is true, though, that the documentation page does not make that claim, and
>>> can be misleading.
>>>
>>> I see two possible action points out of this discussion:
>>> 1- State clearly in the beginning that the example discussed is not
>>> correct under the assumption that a process may finish the computation
>>> before another has started, and the example is there for illustration
>>> purposes;
>>> 2- Have another example following the current one that discusses the
>>> problem and shows how to fix it. This is an interesting option that
>>> illustrates how one could reason about a solution when developing with
>>> zookeeper.
>>>
>>>
>> This (2) sounds much better to me. Semih, would you like to give that a
>> try? (updating the docs I mean)
>>
>> Patrick
>>
>>
>>> If you are interested in helping us fix it, Semih, then you could perhaps
>>> create a jira and assign yourself to fix it. I can help you out.
>>>
>>> -Flavio
>>>
>>> On Mar 7, 2011, at 11:23 AM, Semih Salihoglu wrote:
>>>
>>> Hi Mahadev,
>>>
>>> Sorry for the late response. I agree, actually in this other
>>> documentation
>>> http://hadoop.apache.org/zookeeper/docs/r3.0.0/recipes.html, where there
>>> is
>>> only the pseudo-code, I think this situation is avoided. Here there is
>>> another znode /ready that all nodes have a watch on. And after each node
>>> writes their own ephemeral child, they don't wait. They read how many of
>>> has
>>> been written and the last one writes the /ready znode and everyone wakes
>>> up.
>>> The only race condition in this one is that there can be two nodes trying
>>> to
>>> write /ready and only one of them will succeed but this is ok.
>>>
>>> Thank you again,
>>>
>>> semih
>>>
>>> On Sat, Mar 5, 2011 at 6:41 PM, Mahadev Konar <ma...@apache.org>
>>> wrote:
>>>
>>> Semih,
>>>
>>> You pointed it out right. It is possible ot enter into a situation
>>>
>>> like that. The recipe does have a bug. It can be fixed with the last
>>>
>>> client creating a special znode and every node in the list watching
>>>
>>> for that (so itll be an indication for entering the barrier). no?
>>>
>>>
>>> thanks
>>>
>>> mahadev
>>>
>>>
>>> On Sat, Mar 5, 2011 at 5:06 PM, Semih Salihoglu <se...@stanford.edu>
>>>
>>> wrote:
>>>
>>> Hi All,
>>>
>>>
>>> I am new to this group and to ZooKeeper. I was readin the Barrier
>>>
>>> tutorial
>>>
>>> in one of the ZooKeeper documentations.
>>>
>>> http://hadoop.apache.org/zookeeper/docs/current/zookeeperTutorial.html .
>>>
>>> A
>>>
>>> barrier primitive is exactly how I want to use ZooKeeper. I have a
>>>
>>> question
>>>
>>> about this example. It's not really a ZooKeeper question, it's more a
>>>
>>> question about the Barrier primitive I think. Here it is: In the enter
>>>
>>> method of this Barrier implementation below
>>>
>>>
>>> boolean enter() throws KeeperException, InterruptedException{
>>>
>>>            zk.create(root + "/" + name, new byte[0], Ids.OPEN_ACL_UNSAFE,
>>>
>>>                   CreateMode.EPHEMERAL_SEQUENTIAL);
>>>
>>>           while (true) {
>>>
>>>               synchronized (mutex) {
>>>
>>>                    List<String> list = zk.getChildren(root, true);
>>>
>>>
>>>                    if (list.size() < size) {
>>>
>>>                       mutex.wait();
>>>
>>>                   } else {
>>>
>>>                       return true;
>>>
>>>                    }
>>>
>>>               }
>>>
>>>            }
>>>
>>>       }
>>>
>>>
>>> could there be a race condition? Let's say there are two
>>>
>>> machines/nodes: node1 and node2 that will use this code to synchronize
>>>
>>> over ZK. Let's say the following steps take place:
>>>
>>>
>>>
>>>  1. node1 calls the zk.create method and then reads the number of
>>>
>>> children, and sees that it's 1 and starts waiting.
>>>
>>>  2. node2 calls the zk.create method (doesn't call the
>>>
>>> zk.getChildren method yet, let's say it's very slow)
>>>
>>>  3. node1 is notified that the number of children on the znode
>>>
>>> changed, it checks that the size is 2 so it leaves the barrier, it
>>>
>>> does its work and then leaves the barrier, deleting its node.
>>>
>>>  4. node2 calls zk.getChildren and because node1 has already left,
>>>
>>> it sees that the number of children is equal to 1. Since node1 will
>>>
>>> never enter the barrier again, it will keep waiting.
>>>
>>>
>>> Could this scenario happen? If not, what is preventing this? I haven't
>>>
>>> copied the code piece that enters barrier-does work-leaves barrier.
>>>
>>> But in the link I pasted above, it's the barrierTest(String args[])
>>>
>>> method.
>>>
>>>
>>> Thank you very much in advance,
>>>
>>>
>>> semih
>>>
>>>
>>>
>>>
>>>   *flavio*
>>> *junqueira*
>>>
>>> research scientist
>>>
>>> fpj@yahoo-inc.com
>>> direct +34 93-183-8828
>>>
>>> avinguda diagonal 177, 8th floor, barcelona, 08018, es
>>> phone (408) 349 3300    fax (408) 349 3301
>>>
>>>
>>>
>>
>
> *flavio*
> *junqueira*
>
> research scientist
>
> fpj@yahoo-inc.com
> direct +34 93-183-8828
>
> avinguda diagonal 177, 8th floor, barcelona, 08018, es
> phone (408) 349 3300    fax (408) 349 3301
>
>
>

Re: Question about the Barrier Java example on the ZooKeeper documentation

Posted by Flavio Junqueira <fp...@yahoo-inc.com>.
Hi Semih, Jira is the system we use to report and discuss zookeeper  
issues:

	https://issues.apache.org/jira/browse/ZOOKEEPER

Once you have an account, you can create a new issue, describe it, and  
propose a fix to the problem at hand.

-Flavio

On Mar 8, 2011, at 10:13 PM, Semih Salihoglu wrote:

> Sure, I'll get to it this weekend probably.
>
> I don't know what jira is so some information of how to do this  
> would be very helpful.
>
> Thank you,
>
> semih
>
> On Tue, Mar 8, 2011 at 8:31 AM, Patrick Hunt <ph...@apache.org> wrote:
> On Tue, Mar 8, 2011 at 5:59 AM, Flavio Junqueira <fp...@yahoo-inc.com>  
> wrote:
> I believe the goal of the examples was never to be a complete  
> solutions to barriers or queues, but just to give a quick bootstrap  
> to beginners. It is true, though, that the documentation page does  
> not make that claim, and can be misleading.
>
> I see two possible action points out of this discussion:
> 	
> 1- State clearly in the beginning that the example discussed is not  
> correct under the assumption that a process may finish the  
> computation before another has started, and the example is there for  
> illustration purposes;
> 2- Have another example following the current one that discusses the  
> problem and shows how to fix it. This is an interesting option that  
> illustrates how one could reason about a solution when developing  
> with zookeeper.
>
>
> This (2) sounds much better to me. Semih, would you like to give  
> that a try? (updating the docs I mean)
>
> Patrick
>
> If you are interested in helping us fix it, Semih, then you could  
> perhaps create a jira and assign yourself to fix it. I can help you  
> out.
>
> -Flavio
>
> On Mar 7, 2011, at 11:23 AM, Semih Salihoglu wrote:
>
>> Hi Mahadev,
>>
>> Sorry for the late response. I agree, actually in this other  
>> documentation
>> http://hadoop.apache.org/zookeeper/docs/r3.0.0/recipes.html, where  
>> there is
>> only the pseudo-code, I think this situation is avoided. Here there  
>> is
>> another znode /ready that all nodes have a watch on. And after each  
>> node
>> writes their own ephemeral child, they don't wait. They read how  
>> many of has
>> been written and the last one writes the /ready znode and everyone  
>> wakes up.
>> The only race condition in this one is that there can be two nodes  
>> trying to
>> write /ready and only one of them will succeed but this is ok.
>>
>> Thank you again,
>>
>> semih
>>
>> On Sat, Mar 5, 2011 at 6:41 PM, Mahadev Konar <ma...@apache.org>  
>> wrote:
>>
>>> Semih,
>>> You pointed it out right. It is possible ot enter into a situation
>>> like that. The recipe does have a bug. It can be fixed with the last
>>> client creating a special znode and every node in the list watching
>>> for that (so itll be an indication for entering the barrier). no?
>>>
>>> thanks
>>> mahadev
>>>
>>> On Sat, Mar 5, 2011 at 5:06 PM, Semih Salihoglu <se...@stanford.edu>
>>> wrote:
>>>> Hi All,
>>>>
>>>> I am new to this group and to ZooKeeper. I was readin the Barrier
>>> tutorial
>>>> in one of the ZooKeeper documentations.
>>>> http://hadoop.apache.org/zookeeper/docs/current/zookeeperTutorial.html 
>>>>  .
>>> A
>>>> barrier primitive is exactly how I want to use ZooKeeper. I have a
>>> question
>>>> about this example. It's not really a ZooKeeper question, it's  
>>>> more a
>>>> question about the Barrier primitive I think. Here it is: In the  
>>>> enter
>>>> method of this Barrier implementation below
>>>>
>>>> boolean enter() throws KeeperException, InterruptedException{
>>>>           zk.create(root + "/" + name, new byte[0],  
>>>> Ids.OPEN_ACL_UNSAFE,
>>>>                   CreateMode.EPHEMERAL_SEQUENTIAL);
>>>>           while (true) {
>>>>               synchronized (mutex) {
>>>>                   List<String> list = zk.getChildren(root, true);
>>>>
>>>>                   if (list.size() < size) {
>>>>                       mutex.wait();
>>>>                   } else {
>>>>                       return true;
>>>>                   }
>>>>               }
>>>>           }
>>>>       }
>>>>
>>>> could there be a race condition? Let's say there are two
>>>> machines/nodes: node1 and node2 that will use this code to  
>>>> synchronize
>>>> over ZK. Let's say the following steps take place:
>>>>
>>>>
>>>>  1. node1 calls the zk.create method and then reads the number of
>>>> children, and sees that it's 1 and starts waiting.
>>>>  2. node2 calls the zk.create method (doesn't call the
>>>> zk.getChildren method yet, let's say it's very slow)
>>>>  3. node1 is notified that the number of children on the znode
>>>> changed, it checks that the size is 2 so it leaves the barrier, it
>>>> does its work and then leaves the barrier, deleting its node.
>>>>  4. node2 calls zk.getChildren and because node1 has already left,
>>>> it sees that the number of children is equal to 1. Since node1 will
>>>> never enter the barrier again, it will keep waiting.
>>>>
>>>> Could this scenario happen? If not, what is preventing this? I  
>>>> haven't
>>>> copied the code piece that enters barrier-does work-leaves barrier.
>>>> But in the link I pasted above, it's the barrierTest(String args[])
>>>> method.
>>>>
>>>> Thank you very much in advance,
>>>>
>>>> semih
>>>>
>>>
>
> flavio
> junqueira
>
> research scientist
>
> fpj@yahoo-inc.com
> direct +34 93-183-8828
>
> avinguda diagonal 177, 8th floor, barcelona, 08018, es
> phone (408) 349 3300    fax (408) 349 3301
>
>
>
>
>

flavio
junqueira

research scientist

fpj@yahoo-inc.com
direct +34 93-183-8828

avinguda diagonal 177, 8th floor, barcelona, 08018, es
phone (408) 349 3300    fax (408) 349 3301




Re: Question about the Barrier Java example on the ZooKeeper documentation

Posted by Semih Salihoglu <se...@stanford.edu>.
Sure, I'll get to it this weekend probably.

I don't know what jira is so some information of how to do this would be
very helpful.

Thank you,

semih

On Tue, Mar 8, 2011 at 8:31 AM, Patrick Hunt <ph...@apache.org> wrote:

> On Tue, Mar 8, 2011 at 5:59 AM, Flavio Junqueira <fp...@yahoo-inc.com>wrote:
>
>> I believe the goal of the examples was never to be a complete solutions to
>> barriers or queues, but just to give a quick bootstrap to beginners. It is
>> true, though, that the documentation page does not make that claim, and can
>> be misleading.
>>
>> I see two possible action points out of this discussion:
>> 1- State clearly in the beginning that the example discussed is not
>> correct under the assumption that a process may finish the computation
>> before another has started, and the example is there for illustration
>> purposes;
>> 2- Have another example following the current one that discusses the
>> problem and shows how to fix it. This is an interesting option that
>> illustrates how one could reason about a solution when developing with
>> zookeeper.
>>
>>
> This (2) sounds much better to me. Semih, would you like to give that a
> try? (updating the docs I mean)
>
> Patrick
>
>
>> If you are interested in helping us fix it, Semih, then you could perhaps
>> create a jira and assign yourself to fix it. I can help you out.
>>
>> -Flavio
>>
>> On Mar 7, 2011, at 11:23 AM, Semih Salihoglu wrote:
>>
>> Hi Mahadev,
>>
>> Sorry for the late response. I agree, actually in this other documentation
>> http://hadoop.apache.org/zookeeper/docs/r3.0.0/recipes.html, where there
>> is
>> only the pseudo-code, I think this situation is avoided. Here there is
>> another znode /ready that all nodes have a watch on. And after each node
>> writes their own ephemeral child, they don't wait. They read how many of
>> has
>> been written and the last one writes the /ready znode and everyone wakes
>> up.
>> The only race condition in this one is that there can be two nodes trying
>> to
>> write /ready and only one of them will succeed but this is ok.
>>
>> Thank you again,
>>
>> semih
>>
>> On Sat, Mar 5, 2011 at 6:41 PM, Mahadev Konar <ma...@apache.org> wrote:
>>
>> Semih,
>>
>> You pointed it out right. It is possible ot enter into a situation
>>
>> like that. The recipe does have a bug. It can be fixed with the last
>>
>> client creating a special znode and every node in the list watching
>>
>> for that (so itll be an indication for entering the barrier). no?
>>
>>
>> thanks
>>
>> mahadev
>>
>>
>> On Sat, Mar 5, 2011 at 5:06 PM, Semih Salihoglu <se...@stanford.edu>
>>
>> wrote:
>>
>> Hi All,
>>
>>
>> I am new to this group and to ZooKeeper. I was readin the Barrier
>>
>> tutorial
>>
>> in one of the ZooKeeper documentations.
>>
>> http://hadoop.apache.org/zookeeper/docs/current/zookeeperTutorial.html .
>>
>> A
>>
>> barrier primitive is exactly how I want to use ZooKeeper. I have a
>>
>> question
>>
>> about this example. It's not really a ZooKeeper question, it's more a
>>
>> question about the Barrier primitive I think. Here it is: In the enter
>>
>> method of this Barrier implementation below
>>
>>
>> boolean enter() throws KeeperException, InterruptedException{
>>
>>            zk.create(root + "/" + name, new byte[0], Ids.OPEN_ACL_UNSAFE,
>>
>>                   CreateMode.EPHEMERAL_SEQUENTIAL);
>>
>>           while (true) {
>>
>>               synchronized (mutex) {
>>
>>                    List<String> list = zk.getChildren(root, true);
>>
>>
>>                    if (list.size() < size) {
>>
>>                       mutex.wait();
>>
>>                   } else {
>>
>>                       return true;
>>
>>                    }
>>
>>               }
>>
>>            }
>>
>>       }
>>
>>
>> could there be a race condition? Let's say there are two
>>
>> machines/nodes: node1 and node2 that will use this code to synchronize
>>
>> over ZK. Let's say the following steps take place:
>>
>>
>>
>>  1. node1 calls the zk.create method and then reads the number of
>>
>> children, and sees that it's 1 and starts waiting.
>>
>>  2. node2 calls the zk.create method (doesn't call the
>>
>> zk.getChildren method yet, let's say it's very slow)
>>
>>  3. node1 is notified that the number of children on the znode
>>
>> changed, it checks that the size is 2 so it leaves the barrier, it
>>
>> does its work and then leaves the barrier, deleting its node.
>>
>>  4. node2 calls zk.getChildren and because node1 has already left,
>>
>> it sees that the number of children is equal to 1. Since node1 will
>>
>> never enter the barrier again, it will keep waiting.
>>
>>
>> Could this scenario happen? If not, what is preventing this? I haven't
>>
>> copied the code piece that enters barrier-does work-leaves barrier.
>>
>> But in the link I pasted above, it's the barrierTest(String args[])
>>
>> method.
>>
>>
>> Thank you very much in advance,
>>
>>
>> semih
>>
>>
>>
>>
>>   *flavio*
>> *junqueira*
>>
>> research scientist
>>
>> fpj@yahoo-inc.com
>> direct +34 93-183-8828
>>
>> avinguda diagonal 177, 8th floor, barcelona, 08018, es
>> phone (408) 349 3300    fax (408) 349 3301
>>
>>
>>
>

Re: Question about the Barrier Java example on the ZooKeeper documentation

Posted by Patrick Hunt <ph...@apache.org>.
On Tue, Mar 8, 2011 at 5:59 AM, Flavio Junqueira <fp...@yahoo-inc.com> wrote:

> I believe the goal of the examples was never to be a complete solutions to
> barriers or queues, but just to give a quick bootstrap to beginners. It is
> true, though, that the documentation page does not make that claim, and can
> be misleading.
>
> I see two possible action points out of this discussion:
> 1- State clearly in the beginning that the example discussed is not correct
> under the assumption that a process may finish the computation before
> another has started, and the example is there for illustration purposes;
> 2- Have another example following the current one that discusses the
> problem and shows how to fix it. This is an interesting option that
> illustrates how one could reason about a solution when developing with
> zookeeper.
>
>
This (2) sounds much better to me. Semih, would you like to give that a try?
(updating the docs I mean)

Patrick


> If you are interested in helping us fix it, Semih, then you could perhaps
> create a jira and assign yourself to fix it. I can help you out.
>
> -Flavio
>
> On Mar 7, 2011, at 11:23 AM, Semih Salihoglu wrote:
>
> Hi Mahadev,
>
> Sorry for the late response. I agree, actually in this other documentation
> http://hadoop.apache.org/zookeeper/docs/r3.0.0/recipes.html, where there
> is
> only the pseudo-code, I think this situation is avoided. Here there is
> another znode /ready that all nodes have a watch on. And after each node
> writes their own ephemeral child, they don't wait. They read how many of
> has
> been written and the last one writes the /ready znode and everyone wakes
> up.
> The only race condition in this one is that there can be two nodes trying
> to
> write /ready and only one of them will succeed but this is ok.
>
> Thank you again,
>
> semih
>
> On Sat, Mar 5, 2011 at 6:41 PM, Mahadev Konar <ma...@apache.org> wrote:
>
> Semih,
>
> You pointed it out right. It is possible ot enter into a situation
>
> like that. The recipe does have a bug. It can be fixed with the last
>
> client creating a special znode and every node in the list watching
>
> for that (so itll be an indication for entering the barrier). no?
>
>
> thanks
>
> mahadev
>
>
> On Sat, Mar 5, 2011 at 5:06 PM, Semih Salihoglu <se...@stanford.edu>
>
> wrote:
>
> Hi All,
>
>
> I am new to this group and to ZooKeeper. I was readin the Barrier
>
> tutorial
>
> in one of the ZooKeeper documentations.
>
> http://hadoop.apache.org/zookeeper/docs/current/zookeeperTutorial.html .
>
> A
>
> barrier primitive is exactly how I want to use ZooKeeper. I have a
>
> question
>
> about this example. It's not really a ZooKeeper question, it's more a
>
> question about the Barrier primitive I think. Here it is: In the enter
>
> method of this Barrier implementation below
>
>
> boolean enter() throws KeeperException, InterruptedException{
>
>           zk.create(root + "/" + name, new byte[0], Ids.OPEN_ACL_UNSAFE,
>
>                   CreateMode.EPHEMERAL_SEQUENTIAL);
>
>           while (true) {
>
>               synchronized (mutex) {
>
>                   List<String> list = zk.getChildren(root, true);
>
>
>                   if (list.size() < size) {
>
>                       mutex.wait();
>
>                   } else {
>
>                       return true;
>
>                   }
>
>               }
>
>           }
>
>       }
>
>
> could there be a race condition? Let's say there are two
>
> machines/nodes: node1 and node2 that will use this code to synchronize
>
> over ZK. Let's say the following steps take place:
>
>
>
>  1. node1 calls the zk.create method and then reads the number of
>
> children, and sees that it's 1 and starts waiting.
>
>  2. node2 calls the zk.create method (doesn't call the
>
> zk.getChildren method yet, let's say it's very slow)
>
>  3. node1 is notified that the number of children on the znode
>
> changed, it checks that the size is 2 so it leaves the barrier, it
>
> does its work and then leaves the barrier, deleting its node.
>
>  4. node2 calls zk.getChildren and because node1 has already left,
>
> it sees that the number of children is equal to 1. Since node1 will
>
> never enter the barrier again, it will keep waiting.
>
>
> Could this scenario happen? If not, what is preventing this? I haven't
>
> copied the code piece that enters barrier-does work-leaves barrier.
>
> But in the link I pasted above, it's the barrierTest(String args[])
>
> method.
>
>
> Thank you very much in advance,
>
>
> semih
>
>
>
>
> *flavio*
> *junqueira*
>
> research scientist
>
> fpj@yahoo-inc.com
> direct +34 93-183-8828
>
> avinguda diagonal 177, 8th floor, barcelona, 08018, es
> phone (408) 349 3300    fax (408) 349 3301
>
>
>

Re: Question about the Barrier Java example on the ZooKeeper documentation

Posted by Flavio Junqueira <fp...@yahoo-inc.com>.
I believe the goal of the examples was never to be a complete  
solutions to barriers or queues, but just to give a quick bootstrap to  
beginners. It is true, though, that the documentation page does not  
make that claim, and can be misleading.

I see two possible action points out of this discussion:
	
1- State clearly in the beginning that the example discussed is not  
correct under the assumption that a process may finish the computation  
before another has started, and the example is there for illustration  
purposes;
2- Have another example following the current one that discusses the  
problem and shows how to fix it. This is an interesting option that  
illustrates how one could reason about a solution when developing with  
zookeeper.

If you are interested in helping us fix it, Semih, then you could  
perhaps create a jira and assign yourself to fix it. I can help you out.

-Flavio

On Mar 7, 2011, at 11:23 AM, Semih Salihoglu wrote:

> Hi Mahadev,
>
> Sorry for the late response. I agree, actually in this other  
> documentation
> http://hadoop.apache.org/zookeeper/docs/r3.0.0/recipes.html, where  
> there is
> only the pseudo-code, I think this situation is avoided. Here there is
> another znode /ready that all nodes have a watch on. And after each  
> node
> writes their own ephemeral child, they don't wait. They read how  
> many of has
> been written and the last one writes the /ready znode and everyone  
> wakes up.
> The only race condition in this one is that there can be two nodes  
> trying to
> write /ready and only one of them will succeed but this is ok.
>
> Thank you again,
>
> semih
>
> On Sat, Mar 5, 2011 at 6:41 PM, Mahadev Konar <ma...@apache.org>  
> wrote:
>
>> Semih,
>> You pointed it out right. It is possible ot enter into a situation
>> like that. The recipe does have a bug. It can be fixed with the last
>> client creating a special znode and every node in the list watching
>> for that (so itll be an indication for entering the barrier). no?
>>
>> thanks
>> mahadev
>>
>> On Sat, Mar 5, 2011 at 5:06 PM, Semih Salihoglu <se...@stanford.edu>
>> wrote:
>>> Hi All,
>>>
>>> I am new to this group and to ZooKeeper. I was readin the Barrier
>> tutorial
>>> in one of the ZooKeeper documentations.
>>> http://hadoop.apache.org/zookeeper/docs/current/zookeeperTutorial.html 
>>>  .
>> A
>>> barrier primitive is exactly how I want to use ZooKeeper. I have a
>> question
>>> about this example. It's not really a ZooKeeper question, it's  
>>> more a
>>> question about the Barrier primitive I think. Here it is: In the  
>>> enter
>>> method of this Barrier implementation below
>>>
>>> boolean enter() throws KeeperException, InterruptedException{
>>>           zk.create(root + "/" + name, new byte[0],  
>>> Ids.OPEN_ACL_UNSAFE,
>>>                   CreateMode.EPHEMERAL_SEQUENTIAL);
>>>           while (true) {
>>>               synchronized (mutex) {
>>>                   List<String> list = zk.getChildren(root, true);
>>>
>>>                   if (list.size() < size) {
>>>                       mutex.wait();
>>>                   } else {
>>>                       return true;
>>>                   }
>>>               }
>>>           }
>>>       }
>>>
>>> could there be a race condition? Let's say there are two
>>> machines/nodes: node1 and node2 that will use this code to  
>>> synchronize
>>> over ZK. Let's say the following steps take place:
>>>
>>>
>>>  1. node1 calls the zk.create method and then reads the number of
>>> children, and sees that it's 1 and starts waiting.
>>>  2. node2 calls the zk.create method (doesn't call the
>>> zk.getChildren method yet, let's say it's very slow)
>>>  3. node1 is notified that the number of children on the znode
>>> changed, it checks that the size is 2 so it leaves the barrier, it
>>> does its work and then leaves the barrier, deleting its node.
>>>  4. node2 calls zk.getChildren and because node1 has already left,
>>> it sees that the number of children is equal to 1. Since node1 will
>>> never enter the barrier again, it will keep waiting.
>>>
>>> Could this scenario happen? If not, what is preventing this? I  
>>> haven't
>>> copied the code piece that enters barrier-does work-leaves barrier.
>>> But in the link I pasted above, it's the barrierTest(String args[])
>>> method.
>>>
>>> Thank you very much in advance,
>>>
>>> semih
>>>
>>

flavio
junqueira

research scientist

fpj@yahoo-inc.com
direct +34 93-183-8828

avinguda diagonal 177, 8th floor, barcelona, 08018, es
phone (408) 349 3300    fax (408) 349 3301




Re: Question about the Barrier Java example on the ZooKeeper documentation

Posted by Semih Salihoglu <se...@stanford.edu>.
Hi Mahadev,

Sorry for the late response. I agree, actually in this other documentation
http://hadoop.apache.org/zookeeper/docs/r3.0.0/recipes.html, where there is
only the pseudo-code, I think this situation is avoided. Here there is
another znode /ready that all nodes have a watch on. And after each node
writes their own ephemeral child, they don't wait. They read how many of has
been written and the last one writes the /ready znode and everyone wakes up.
The only race condition in this one is that there can be two nodes trying to
write /ready and only one of them will succeed but this is ok.

Thank you again,

semih

On Sat, Mar 5, 2011 at 6:41 PM, Mahadev Konar <ma...@apache.org> wrote:

> Semih,
>  You pointed it out right. It is possible ot enter into a situation
> like that. The recipe does have a bug. It can be fixed with the last
> client creating a special znode and every node in the list watching
> for that (so itll be an indication for entering the barrier). no?
>
> thanks
> mahadev
>
> On Sat, Mar 5, 2011 at 5:06 PM, Semih Salihoglu <se...@stanford.edu>
> wrote:
> > Hi All,
> >
> > I am new to this group and to ZooKeeper. I was readin the Barrier
> tutorial
> > in one of the ZooKeeper documentations.
> > http://hadoop.apache.org/zookeeper/docs/current/zookeeperTutorial.html .
> A
> > barrier primitive is exactly how I want to use ZooKeeper. I have a
> question
> > about this example. It's not really a ZooKeeper question, it's more a
> > question about the Barrier primitive I think. Here it is: In the enter
> > method of this Barrier implementation below
> >
> > boolean enter() throws KeeperException, InterruptedException{
> >            zk.create(root + "/" + name, new byte[0], Ids.OPEN_ACL_UNSAFE,
> >                    CreateMode.EPHEMERAL_SEQUENTIAL);
> >            while (true) {
> >                synchronized (mutex) {
> >                    List<String> list = zk.getChildren(root, true);
> >
> >                    if (list.size() < size) {
> >                        mutex.wait();
> >                    } else {
> >                        return true;
> >                    }
> >                }
> >            }
> >        }
> >
> > could there be a race condition? Let's say there are two
> > machines/nodes: node1 and node2 that will use this code to synchronize
> > over ZK. Let's say the following steps take place:
> >
> >
> >   1. node1 calls the zk.create method and then reads the number of
> > children, and sees that it's 1 and starts waiting.
> >   2. node2 calls the zk.create method (doesn't call the
> > zk.getChildren method yet, let's say it's very slow)
> >   3. node1 is notified that the number of children on the znode
> > changed, it checks that the size is 2 so it leaves the barrier, it
> > does its work and then leaves the barrier, deleting its node.
> >   4. node2 calls zk.getChildren and because node1 has already left,
> > it sees that the number of children is equal to 1. Since node1 will
> > never enter the barrier again, it will keep waiting.
> >
> > Could this scenario happen? If not, what is preventing this? I haven't
> > copied the code piece that enters barrier-does work-leaves barrier.
> > But in the link I pasted above, it's the barrierTest(String args[])
> > method.
> >
> > Thank you very much in advance,
> >
> > semih
> >
>

Re: Question about the Barrier Java example on the ZooKeeper documentation

Posted by Mahadev Konar <ma...@apache.org>.
Semih,
  You pointed it out right. It is possible ot enter into a situation
like that. The recipe does have a bug. It can be fixed with the last
client creating a special znode and every node in the list watching
for that (so itll be an indication for entering the barrier). no?

thanks
mahadev

On Sat, Mar 5, 2011 at 5:06 PM, Semih Salihoglu <se...@stanford.edu> wrote:
> Hi All,
>
> I am new to this group and to ZooKeeper. I was readin the Barrier tutorial
> in one of the ZooKeeper documentations.
> http://hadoop.apache.org/zookeeper/docs/current/zookeeperTutorial.html . A
> barrier primitive is exactly how I want to use ZooKeeper. I have a question
> about this example. It's not really a ZooKeeper question, it's more a
> question about the Barrier primitive I think. Here it is: In the enter
> method of this Barrier implementation below
>
> boolean enter() throws KeeperException, InterruptedException{
>            zk.create(root + "/" + name, new byte[0], Ids.OPEN_ACL_UNSAFE,
>                    CreateMode.EPHEMERAL_SEQUENTIAL);
>            while (true) {
>                synchronized (mutex) {
>                    List<String> list = zk.getChildren(root, true);
>
>                    if (list.size() < size) {
>                        mutex.wait();
>                    } else {
>                        return true;
>                    }
>                }
>            }
>        }
>
> could there be a race condition? Let's say there are two
> machines/nodes: node1 and node2 that will use this code to synchronize
> over ZK. Let's say the following steps take place:
>
>
>   1. node1 calls the zk.create method and then reads the number of
> children, and sees that it's 1 and starts waiting.
>   2. node2 calls the zk.create method (doesn't call the
> zk.getChildren method yet, let's say it's very slow)
>   3. node1 is notified that the number of children on the znode
> changed, it checks that the size is 2 so it leaves the barrier, it
> does its work and then leaves the barrier, deleting its node.
>   4. node2 calls zk.getChildren and because node1 has already left,
> it sees that the number of children is equal to 1. Since node1 will
> never enter the barrier again, it will keep waiting.
>
> Could this scenario happen? If not, what is preventing this? I haven't
> copied the code piece that enters barrier-does work-leaves barrier.
> But in the link I pasted above, it's the barrierTest(String args[])
> method.
>
> Thank you very much in advance,
>
> semih
>