You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@curator.apache.org by mi...@speakeasyapp.net on 2015/04/06 23:39:40 UTC

Re: Reaper/ChildReaper usage

I’ve still been too slogged to test this yet, but a second question has arisen.


Imagine this scenario:


0. Single CuratorFramework static instance

Zookeeper ensemble explodes


ConnectionListener detects this and sets an AtomicBoolean to True


The AtomicBoolean is reset to false once ConnectionListener gets a connection again.


In the meantime attempts to get/release locks check the Boolean and throw an exception.


The LockFactory also detects this and instead of handing out a distributed lock implementation hands out a jvm locking. It switches back once the connection is reestablished, and many warnings are logged.



The idea was that we would fall back to jvm specific locking during this period (and then switch back). [Yes, a few dubious aspects]


Anyway, this is what happened in staging:


Ensemble exploded


ConnectionListener detected, set AtomicBoolean to true


The logs show the locks continued to try to get acquired - FOREVER! (well at least 30-40 mins)


Restarting the zookeeper ensemble led to zookeeper complaining about more than max 60 clients.


Only restarting the box with the distributed locking finally recovered.



Is there no way to have a lock.acquire() finally give up on connection loss? Even the timed lock.acquire doesn’t exit.









From: Jordan Zimmerman
Sent: ‎Tuesday‎, ‎March‎ ‎31‎, ‎2015 ‎11‎:‎26‎ ‎AM
To: mike@speakeasyapp.net, user@curator.apache.org





OK - if you don’t mind, please build from source and see if it fixes your issue.




-JZ







On March 31, 2015 at 1:26:02 PM, mike@speakeasyapp.net (mike@speakeasyapp.net) wrote:




 
FYI Looks from Github that this was not merged until after the 2.7.1 release.






From: Jordan Zimmerman
Sent: ‎Tuesday‎, ‎March‎ ‎31‎, ‎2015 ‎10‎:‎32‎ ‎AM
To: mike@speakeasyapp.net, user@curator.apache.org





It looks like CURATOR-173 was possibly released in Curator 2.7.1. Scott Blum needs to respond on this.




-Jordan







On March 31, 2015 at 12:31:54 PM, mike@speakeasyapp.net (mike@speakeasyapp.net) wrote:




Confirmed btw that InterprocessMutex is reaped properly. However locks cannot be reentrant across threads in the same process so I am wondering if I should pull together my own patch from the ticket?






From: David Kesler
Sent: ‎Tuesday‎, ‎March‎ ‎31‎, ‎2015 ‎10‎:‎17‎ ‎AM
To: user@curator.apache.org






How are you constructing your ChildReaper?  It sounds like you’re constructing a child reaper using the path of the lock itself and the path the childreaper is watching is getting deleted (though I’m not sure why).  If you’re planning on having a number of locks of the form /lock/calendar/uuid1, /lock/calendar/uuid2, etc., you should be creating a single ChildReaper at startup that uses /lock/calendar as the path for your child reaper.  This will ensure that the children of /lock/calendar (that is, your uuid locks) will get reaped.  You don’t need to be adding /lock/calendar/uuid to your child reaper directly.

 

As a side note,  if you’re using InterProcessSemaphoreMutex, there’s currently an issue with ChildReaper in 2.7 (https://issues.apache.org/jira/browse/CURATOR-173) which should hopefully be fixed in the next release.  If you can, you may want to consider InterProcessMutex instead. 

 



From: mike@speakeasyapp.net [mailto:mike@speakeasyapp.net]
Sent: Tuesday, March 31, 2015 12:57 PM
To: user@curator.apache.org
Subject: Reaper/ChildReaper usage

 



Hi, I’m using the InterProcessSemaphoreMutex for a distributed locking recipe.


 


A typical path for a lock might be


 


/lock/calendar/<uuid>


 


I’d assume these paths need to be cleaned up eventually, so I’ve tried using childreaper and reaper to do so after I unlock the lock.


 


ChildReaper kind of works. If I add /lock/calendar/uuid it happily removes the children. the log shows it removes the leases and locks and the node itself is shown to be gone in zkClient However suddenly it begins complaining in a seemingly endless loop that the path is gone. This despite trying Mode.Delete and Mode.Until Gone.


 


Reaper does nothing, probably because /lock/calendar/uuid has children.


 


Am I missing something? Do I not need to clean up these locks? What do I need to worry concurrency wise about.