You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@zookeeper.apache.org by Kevin Harms <ha...@alcf.anl.gov> on 2012/06/15 18:25:00 UTC

Locking Reciepe

  I setup a single zookeeper instance using the binaries distributed with Ubuntu 12.04. I downloaded the 3.3.5 source and compiled the C based locking recipe. I built this into a program of mine and ran into a problem. So I had some questions.

  If i wanted to create 1000 locks, do i setup the locks as follows?
  /lock/0
  /lock/1
  ...
  /lock/999

  is this correct?

  I was running an example with two clients competing for 1 lock running on the same machine the zookeeper instance was running on. I found that zkr_lock_lock() would often fail to acquire the lock, so i put that in a loop with 1000 retries. That seems to make it work most of the time, but other times there would still be a failure at zoo_lock.c:301

                // cannot watch my predecessor i am giving up
                // we need to be able to watch the predecessor
                // since if we do not become a leader the others
                // will keep waiting
[301]           if (ret != ZOK) {
                    free_String_vector(vector);


  I put a printf to see what ret was and it was ZNONODE. Now looking at the code above this spot, get_children is called and then it sorts the results and later calls zoo_wexists. It seems reasonable that the state could change between these two calls? I added a statement that if the result was ZNONODE, it does a goto back to above where get_children is called so it runs the algorithm again.

  That changes seems to make the code work all the time now, but I'm not sure if that change is correct. I've included the diff below. So is it expected that zkr_lock_lock will fail periodically since it only tries to acquire the lock 4 times? 

thanks for any help,
kevin

--- zoo_lock.c.orig	2012-06-15 00:37:53.880508812 -0500
+++ zoo_lock.c	2012-06-15 00:41:41.304518262 -0500
@@ -273,6 +273,7 @@ static int zkr_lock_operation(zkr_lock_m
             mutex->id = getName(retbuf);
         }
         
+tryagain:
         if (mutex->id != NULL) {
             ret = ZCONNECTIONLOSS;
             ret = retry_getchildren(zh, path, vector, ts, retry);
@@ -299,7 +300,9 @@ static int zkr_lock_operation(zkr_lock_m
                 // will keep waiting
                 if (ret != ZOK) {
                     free_String_vector(vector);
+                    if (ret == ZNONODE) goto tryagain;
                     LOG_WARN(("unable to watch my predecessor"));
+                    printf("zret = %d\n", ret);
                     ret = zkr_lock_unlock(mutex);
                     while (ret == 0) {
                         //we have to give up our leadership


Re: Locking Reciepe

Posted by Patrick Hunt <ph...@apache.org>.
Mahadev any insight on this?

On Fri, Jun 15, 2012 at 9:25 AM, Kevin Harms <ha...@alcf.anl.gov> wrote:
>
>  I setup a single zookeeper instance using the binaries distributed with Ubuntu 12.04. I downloaded the 3.3.5 source and compiled the C based locking recipe. I built this into a program of mine and ran into a problem. So I had some questions.
>
>  If i wanted to create 1000 locks, do i setup the locks as follows?
>  /lock/0
>  /lock/1
>  ...
>  /lock/999
>
>  is this correct?
>
>  I was running an example with two clients competing for 1 lock running on the same machine the zookeeper instance was running on. I found that zkr_lock_lock() would often fail to acquire the lock, so i put that in a loop with 1000 retries. That seems to make it work most of the time, but other times there would still be a failure at zoo_lock.c:301
>
>                // cannot watch my predecessor i am giving up
>                // we need to be able to watch the predecessor
>                // since if we do not become a leader the others
>                // will keep waiting
> [301]           if (ret != ZOK) {
>                    free_String_vector(vector);
>
>
>  I put a printf to see what ret was and it was ZNONODE. Now looking at the code above this spot, get_children is called and then it sorts the results and later calls zoo_wexists. It seems reasonable that the state could change between these two calls? I added a statement that if the result was ZNONODE, it does a goto back to above where get_children is called so it runs the algorithm again.
>
>  That changes seems to make the code work all the time now, but I'm not sure if that change is correct. I've included the diff below. So is it expected that zkr_lock_lock will fail periodically since it only tries to acquire the lock 4 times?
>
> thanks for any help,
> kevin
>
> --- zoo_lock.c.orig     2012-06-15 00:37:53.880508812 -0500
> +++ zoo_lock.c  2012-06-15 00:41:41.304518262 -0500
> @@ -273,6 +273,7 @@ static int zkr_lock_operation(zkr_lock_m
>             mutex->id = getName(retbuf);
>         }
>
> +tryagain:
>         if (mutex->id != NULL) {
>             ret = ZCONNECTIONLOSS;
>             ret = retry_getchildren(zh, path, vector, ts, retry);
> @@ -299,7 +300,9 @@ static int zkr_lock_operation(zkr_lock_m
>                 // will keep waiting
>                 if (ret != ZOK) {
>                     free_String_vector(vector);
> +                    if (ret == ZNONODE) goto tryagain;
>                     LOG_WARN(("unable to watch my predecessor"));
> +                    printf("zret = %d\n", ret);
>                     ret = zkr_lock_unlock(mutex);
>                     while (ret == 0) {
>                         //we have to give up our leadership
>