You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@curator.apache.org by Jordan Zimmerman <jo...@jordanzimmerman.com> on 2021/07/20 20:11:19 UTC

Re: [External Sender] Double Locking Issue

> In our case, it wouldn’t throw any exception because it had gone past “creating lock nodes” and was blocked on wait(), which would only then be interrupted when curator watcher notified on previous sequence node delete event.

So, you're using the version of acquire() without a timeout? In any event, this is a problem. When you receive SUSPENDED you really should interrupt any threads that are waiting on Curator. The Curator docs imply this even though it might not be obvious. This is likely the source of your problems. A simple solution is to use the version of acquire that has a timeout and repeatedly call it until success (though your problem may still occur in this case). Maybe we could improve the lock recipe (or something similar) so that locks inside of acquire are interrupted on a network partition. 

[snip of your remaining thoughtful analysis]

I think there's generally a hole in Curator's lock recipe. The lock code does not watch the lock node it has created. So, another process or (as you found) a race with the server might cause the lock node to disappear underneath the lock instance after it thinks it has the lock. 

One thing we can do is to check the session ID before waiting in LockInternals.internalLockLoop(). If the session ID changes after wait exits it suggests the lock node should no longer be trusted and it should be deleted and re-created. That's 1 though anyway. It would be nice to generate a test/simulation for this case so that it can be properly dealt with.

-Jordan

> On Jul 20, 2021, at 3:53 PM, Viswanathan Rajagopal <vi...@workday.com> wrote:
> 
> Thanks Jordan for coming back on this.
>  
> Please find my inline comments. Also provided below few additional version info that we are using,
> Curator version – 2.5.0
> Zookeeper version – 3.5.3
>  
> >  I don't see how this is the case. If the client has received a network partition it shouldn't not consider any locks as being currently held (see the Tech Notes I mentioned in my previous email). If there is a partition during a call to acquire(), acquire() would throw an exception (once the retry policy has expired). BTW, what retry policy are you using?
>  
> In our case, it wouldn’t throw any exception because it had gone past “creating lock nodes” and was blocked on wait(), which would only then be interrupted when curator watcher notified on previous sequence node delete event.
>  
> > This isn't correct. Curator watches the previous node but lock acquisition is always based on calling ZK's getChildren, sorting them, and seeing if the lock's node is the first node in the sequence. If Ephemeral nodes aren't cleaned up it wouldn't be a problem. Any phantom ephemeral nodes would sort first and prevent Curator from believing it holds the lock.
>  
> Totally agreed on your point on lock acquisition. The problem that I was trying to explain is that the network partition occurs while the client is waiting for the lock (in the acquire()) but after the acquire code block has already written out the lock node successfully, so the client code will be blocking in the acquire and not get a connection exception. Once the previous lock node is deleted but not the lock node for this client then this client will leave the acquire thinking it has the lock but actually at this point the ZK server has expired the session and is in the process of deleting our clients lock node.
>  
> Trying to categorize the timeline events below to see if that would help us for better understanding,
> [ CONNECTED ] :
> Client A & Client B calls acquire creating Node N1 & N2 respectively
> Client A acquire() -> holds the lock as its Node N1 is first node in sequence
> Client B acquire() -> created Node N2, watches for previous sequence node (blocked on wait())
>  
> [ SUSPENDED CONNECTION OCCURS ] :
> Network partition happens
> Curator fires SUSPENDED
> Client A, on receiving SUSPENDED event, attempts to release the lock
> Client B, on receiving SUSPENDED event, does nothing as it was not holding any lock (as blocked on acquire() -> wait() )  
>  
> [ SESSION EXPIRED ] :
> Client A attempts to release the lock (with retries)
> Client B does nothing as it was not holding any lock (and is still blocked on acquire() -> wait() )
> Server prepared to cleanup previous client session and deleting lock nodes created as part of previous client session
>  
> [ RECONNECTED ] :
> Client A and Client B reconnected to another server with new session id (Note : This happens before server managed to cleanup / delete ephemeral nodes of previous client session)
> Client A released the lock successfully (means it would delete its lock node N1) and attempts to acquire lock by creating lock node N3 and watches for previous sequence node (N2)
> Client B who was blocked on acquire() -> wait() would be notified with previous sequence node (N1) deletion -> getChildren, sorting them, and seeing if the lock's node is the first node in the sequence. So, Client B sees its lock node N2 still (which I call it as about to be deleted node by server) and thereby acquires the lock
>  
> [ AFTER FEW SECONDS ] :
> Server managed to delete the ephemeral node N2 as part of previous client session cleanup
> Client A who was blocked on acquire() -> wait() would be notified with previous sequence node (N2) deletion -> getChildren, sorting them, and seeing if the lock's node is the first node in the sequence and thereby acquires the lock
> Client B – its local lock thread data went stale (as its lock path N2 not has been deleted by Server)
>  
> >  SUSPENDED in Curator means only that the connect has been lost, not that the session has ended.
> LOST is the state that means the session has ended
> Be aware of how GCs can affect Curator. See the Tech Note here: https://cwiki.apache.org/confluence/display/CURATOR/TN10 <https://cwiki.apache.org/confluence/display/CURATOR/TN10>
> <https://cwiki.apache.org/confluence/display/CURATOR/TN10 <https://cwiki.apache.org/confluence/display/CURATOR/TN10>>
> Also read this Tech Note on session handling: https://cwiki.apache.org/confluence/display/CURATOR/TN14 <https://cwiki.apache.org/confluence/display/CURATOR/TN14>
> <https://cwiki.apache.org/confluence/display/CURATOR/TN14 <https://cwiki.apache.org/confluence/display/CURATOR/TN14>>
>  
> Very good information on the tech notes. Thanks for that. Agreed, it’s always recommended to clear the locks when it sees SUSPENDED event. And we are already following your recommendation. Our application would clear the lock when it sees SUSPENDED event.
>  
> To summarize my thoughts,
> Ephemeral node created by previous session was  still seen by client that reconnected with new session id until server cleans that up the previous session ephemeral node. This could happen if client manage to reconnect with server with new session id before server cleans up the previous session. How it affects the curator lock recipe ? Explanation : The above explained race condition would make acquire() to hold the lock ( as its own lock node still seen ), eventually leading to inconsistent state (i.e. curator local lock state stale) when that lock node is being cleaned up by the server as part of previous session cleanup activities.
> I am NOT seeing a Curator bug here, but looking out for suggestions / recommendations in handling this zookeeper race condition. Either can a feature be added to Curator to cover this case / any recommendations that clients should follow.
>  
> Many Thanks,
> Viswa.
>  
> From: Sean Gibbons <sean.gibbons@workday.com <ma...@workday.com>>
> Date: Tuesday, 20 July 2021 at 12:19
> To: Viswanathan Rajagopal <viswanathan.rajagopa@workday.com <ma...@workday.com>>
> Subject: FW: [External Sender] Re: Double Locking Issue
> 
> 
> 
> On 20/07/2021, 10:12, "Jordan Zimmerman" <jordan@jordanzimmerman.com <ma...@jordanzimmerman.com>> wrote:
> 
>     A few more things...
> 
>     > Based on our initial analysis and few test runs, we saw that Curator acquire() method acquires the lock based on “about to be deleted lock node of previous session”. Explanation : Ephemeral node created by previous session was  still seen by client that reconnected with new session id until server cleans that up. If this happens, Curator acquire() would hold the lock.
> 
>     This isn't correct. Curator watches the previous node but lock acquisition is always based on calling ZK's getChildren, sorting them, and seeing if the lock's node is the first node in the sequence. If Ephemeral nodes aren't cleaned up it wouldn't be a problem. Any phantom ephemeral nodes would sort first and prevent Curator from believing it holds the lock.
> 
>     >  *   On the above mentioned race condition, if client manage to reconnect to server with new session id before server cleans up the ephemeral nodes of client’s previous session,  Curator lock acquire() who is trying to acquire the lock will hold the lock as it still sees the lock node in zookeeper directory. Eventually server would be cleaning up the ephemeral nodes leaving the Curator local lock thread data stale giving the illusion that it still hold the lock while its ephemeral node is gone
> 
>     I don't see how this is the case. If the client has received a network partition it shouldn't not consider any locks as being currently held (see the Tech Notes I mentioned in my previous email). If there is a partition during a call to acquire(), acquire() would throw an exception (once the retry policy has expired). BTW, what retry policy are you using?
> 
>     So, to reiterate, I don't see how phantom/undeleted ephemeral nodes would cause a problem. The only problem it could case is that a given Curator client takes longer to acquire a lock as it waits for those ephemerals to finally get deleted.
> 
>     -JZ
> 
>     > On Jul 19, 2021, at 6:45 PM, Viswanathan Rajagopal <vi...@workday.com.INVALID> wrote:
>     > 
>     > Hi Team,
>     > 
>     > Good day.
>     > 
>     > Recently came across “Double Locking Issue (i.e. two clients acquiring lock)” using Curator code ( InterProcessMutex lock APIs ) in our application
>     > 
>     > Our use case:
>     > 
>     >  *   Two clients attempts to acquire the zookeeper lock using Curator InterProcessMutex and whoever owns it would release it once sees the connection disconnect ( on receiving Connection.SUSPENDED / Connection.LOST Curator Connection Events from Connection listener)
>     > 
>     > Issue we noticed:
>     > 
>     >  *   After session expired & reconnected with new session, both client seems to have acquired the lock. Interesting thing that we found is that one of the clients still holds the lock while its lock node (ephemeral) was gone
>     > 
>     > Things we found:
>     > 
>     >  *   Based on our initial analysis and few test runs, we saw that Curator acquire() method acquires the lock based on “about to be deleted lock node of previous session”. Explanation : Ephemeral node created by previous session was  still seen by client that reconnected with new session id until server cleans that up. If this happens, Curator acquire() would hold the lock.
>     > 
>     > 
>     > 
>     >  *   Clearly we could see the race condition (in zookeeper code) between 1). Client reconnecting to server with new session id and 2). server deleting the ephemeral nodes of client’s previous session. We were able to reproduce this issue using the following approach,
>     >     *   Artificially break the socket connection between client and server for 30s
>     >     *   Artificially pausing the set of server codes for a min and unpause
>     > 
>     > 
>     >  *   On the above mentioned race condition, if client manage to reconnect to server with new session id before server cleans up the ephemeral nodes of client’s previous session,  Curator lock acquire() who is trying to acquire the lock will hold the lock as it still sees the lock node in zookeeper directory. Eventually server would be cleaning up the ephemeral nodes leaving the Curator local lock thread data stale giving the illusion that it still hold the lock while its ephemeral node is gone
>     > 
>     > 
>     >  *   Timeline events described below for better understanding,
>     >     *   At t1, Client A and Client B establishes zookeeper session with session id A1 and B1 respectively
>     >     *   At t2, Client A creates the lock node N1 & acquires the lock
>     >     *   At t3, Client B creates the lock node N2 & blocked on acquire() to acquire the lock
>     >     *   At t4, session timed out for both clients & server is about to clean up the old session • Client A trying to release the lock
>     >     *   At t5, Client A and Client B reconnects to server with new session id A2 and B2 respectively before server deletes the ephemeral node N1 & N2 of previous client session. Client A releases the lock, deleting N1 and trying to acquire it again by creating N3 node and Client B who is blocked on acquire() acquires the lock based on N2 (about to be deleted node created by previous session)
>     >     *   At  t6, server cleans up the ephemeral node N2 created by Client B’s previous session. Client A acquires the lock with node N3 as its previous sequence N2 gets deleted, whereas Client B who incorrectly acquired the lock at t5 timeline still holds the lock
>     > 
>     > 
>     > Note:
>     > 
>     >  *   We are not certain that this race condition that we noticed in zookeeper code is intentional design.
>     > 
>     > Questions:
>     > 
>     >  *   Given this race condition seen in zookeeper code, we would like to hear your recommendations / suggestions to avoid this issue while using Curator lock code?
>     > 
>     > 
>     > 
>     >  *   We also see that Interprocess Mutex has the makeRevocable() API that enables application to revoke lock, but that handles Node change event only but not for Node deleted event. I understand that this makes sense to handle Node change event alone, as to enable application user to revoke lock externally from application side. But would it be also to okay to have one for Node delete event, so as application can register the listener for Node delete event. I would like to hear your thoughts.
>     > 
>     > 
>     > Many Thanks,
>     > Viswa
>

Re: [External Sender] Double Locking Issue

Posted by Jordan Zimmerman <jo...@jordanzimmerman.com>.

I'm now wondering if this is a ZooKeeper bug? In your original timeline you say that Client B sees its original ephemeral node that should have been deleted. ZooKeeper is taking time to delete that ZNode? A simple fix would be to try to write something to that ZNode. I would hope that ZooKeeper would reject the write on the now expired ZNode. So, the lock recipe would be enhanced so that it tries to do a setData on the node before it accepts that is has the lock. Just a thought.

> On Jul 30, 2021, at 9:54 AM, Jordan Zimmerman <jo...@jordanzimmerman.com> wrote:
> 
> Even more... actually, this should already work right? It's been a long time since I looked at this code so I wrote a quick test. wait() will already exits on Connection loss. I apologize, but what I said below is not correct. The LockInternals watcher gets called on connection loss and interrupts the thread already via notifyAll(). So, again, sorry for the noise.
> 
> I'll go over the scenario again to see if I have more comments.
> 
> -Jordan
> 
>> On Jul 30, 2021, at 9:18 AM, Jordan Zimmerman <jordan@jordanzimmerman.com <ma...@jordanzimmerman.com>> wrote:
>> 
>> Actually... I'm looking at LockInternals and it already has a watcher set on the previous node before it calls wait(). That watcher will get called when there's a connection problem. It would be pretty easy to add something in that watcher to interrupt the waiting thread. This could be a pretty simple change actually.
>> 
>> Watcher set just before waiting: https://github.com/apache/curator/blob/master/curator-recipes/src/main/java/org/apache/curator/framework/recipes/locks/LockInternals.java#L301 <https://github.com/apache/curator/blob/master/curator-recipes/src/main/java/org/apache/curator/framework/recipes/locks/LockInternals.java#L301>
>> The Watcher: https://github.com/apache/curator/blob/master/curator-recipes/src/main/java/org/apache/curator/framework/recipes/locks/LockInternals.java#L64 <https://github.com/apache/curator/blob/master/curator-recipes/src/main/java/org/apache/curator/framework/recipes/locks/LockInternals.java#L64>
>> 
>> So, off the top of my head, an AtomicReference<Thread> field could be added to LockInternals. It gets set to the current thread just before waiting. If the Watcher is called with a connection event and the AtomicReference isn't empty, interrupt the thread. This will cause wait() to throw InterruptedException and then the lock node will be deleted.
>> 
>> -Jordan
>> 
>>> On Jul 30, 2021, at 9:11 AM, Jordan Zimmerman <jordan@jordanzimmerman.com <ma...@jordanzimmerman.com>> wrote:
>>> 
>>> Making lock code watch its lock node it has created
>>> 
>>> I don't think this is useful. Once the lock method returns the lock code can't notify the caller/user that the lock node has disappeared. We'd need some kind of additional class "lock watcher" or something that client code would have to periodically call to ensure that it still has the lock. tbh - this has been discussed a lot over the years on the ZooKeeper/Curator channels. Just because you have a ZNode doesn't mean it's 100% valid. You should still practice optimistic locks, etc. But, maybe we can reduce the edge cases with this lock watcher or some other mechanism.
>>> 
>>> Session ID changes after wait exits it suggests the lock node should no longer be trusted and it should be deleted and re-created
>>> 
>>> This is a simple change. But, I don't know if it would help much. It would be better to interrupt waiting lockers when the connection is lost. If an internal mechanism forced any locks blocked in wait() to throw an exception then those lock threads will already delete their ZNodes and try again. This would be the best solution maybe? So, before the lock code goes to wait() it sets a ConnectionStateListener (or something similar) that will interrupt the thread when there are connection problems.
>>> 
>>> Leader Latch code is well protected
>>> 
>>> Yes - the leader recipes are better at this. Maybe there's an opportunity to merge/change code. We could consider deprecating the lock recipes in favor of these?
>>> 
>>> -Jordan
>>> 
>>>> On Jul 29, 2021, at 2:35 AM, Viswanathan Rajagopal <viswanathan.rajagopa@workday.com <ma...@workday.com>> wrote:
>>>> 
>>>> Hi Jordan,
>>>>  
>>>> Thanks for the suggestions. Much helpful.
>>>>  
>>>> Quick summary of my below comments,
>>>> Added test case simulator to reproduce issue (though artificially)
>>>> Added a proposed code just to get your review and suggestions to see whether that would work
>>>> Please find my detailed comments inline,
>>>>  
>>>> > I think there's generally a hole in Curator's lock recipe. The lock code does not watch the lock node it has created. So, another process or (as you found) a race with the server might cause the lock node to disappear underneath the lock instance after it thinks it has the lock. One thing we can do is to check the session ID before waiting in LockInternals.internalLockLoop(). If the session ID changes after wait exits it suggests the lock node should no longer be trusted and it should be deleted and re-created. That's 1 though anyway.
>>>>  
>>>> Yes, you are right. Currently lock code doesn’t watch the lock node it has created. That’s where we would like to hear your feedback on our proposal mentioned in our first mail thread 
>>>> “Curator lock code has makeRevocable() API that enables application to revoke lock anytime from application by triggering NODE_CHANGE event through Revoker.attemptToRevoke() utility. 
>>>> Proposal: Would it be nice to extend makeRevocable() API to handle Node delete event, which would allow application to register the watcher for Node delete event, thereby application can react to Node delete event by revoking the lock ?”. Tried the Code snippet (https://github.com/ViswaNXplore/curator/blob/curator-2.5.0/patch/double_locking_test_simulator/curator-recipes/src/main/java/org/apache/curator/framework/recipes/locks/LockInternals.java#L60 <https://github.com/ViswaNXplore/curator/blob/curator-2.5.0/patch/double_locking_test_simulator/curator-recipes/src/main/java/org/apache/curator/framework/recipes/locks/LockInternals.java#L60>
>>>> ) to see the approach “Making lock code watch its lock node it has created” works and would like to get your thoughts on this and do you see any other impacts there? May need your advice.
>>>>  
>>>> I agree with you on that the session id approach, that would protect us from seeing this issue. The reason why I thought your first approach “making lock code watch it has created” could be more protective is this could protect us from any inconsistencies between original lock node and local lock state.
>>>>   
>>>> >  It would be nice to generate a test/simulation for this case so that it can be properly dealt with.
>>>>  
>>>> Created a test case simulator https://github.com/ViswaNXplore/curator/blob/curator-2.5.0/patch/double_locking_test_simulator/curator-recipes/src/test/java/org/apache/curator/framework/recipes/CuratorRecipeTest.java <https://github.com/ViswaNXplore/curator/blob/curator-2.5.0/patch/double_locking_test_simulator/curator-recipes/src/test/java/org/apache/curator/framework/recipes/CuratorRecipeTest.java>
>>>>  
>>>> Approach #1 (Artificial way of reproducing the issue)
>>>> Run the test case with (SIMULATE_ISSUE_BY_EXPLICIT_DELETE set to true). This would delete the lock node after a pause just to simulate the state where the zkClient had reconnected and it still happens to see the ephermal node just before the server deletes it since its session has expired, but the node is deleted afterwards by the server.
>>>> Approach #2 (A little manual interruption needed to reproduce the issue)
>>>> Run the test case in Debug mode (SIMULATE_ISSUE_BY_EXPLICIT_DELETE set to false)
>>>> Artificially delaying / pausing the ephemeral lock nodes deletion as part of session cleanup process in server code (ZookeeperServer.close() method)
>>>> After a pause (say 5s) to make one of the instance to acquire the lock, Artificially break the socket connection between client and server for 30s (by keeping breakpoint in ClientCnxnSocketNIO.doIO() method). After 30s, we would see session closing logs logged in server code
>>>> After 1min, remove breakpoint in ClientCnxnSocketNIO.doIO() and resume both Thread 2 and Thread 3
>>>> After that, resume server thread (thread name would be “SessionTracker”
>>>> Below Proposals discussed so far (3rd added now for your review)
>>>> Making lock code watch its lock node it has created
>>>> Session ID changes after wait exits it suggests the lock node should no longer be trusted and it should be deleted and re-created
>>>> Leader Latch code is well protected to cover this zookeeper race condition, because Leader Latch code internally handle the connection events (which they use to interrupt latch_acquire_state to reset its latch every time connection is reconnected), means it will explicitly release the latch and recreate new if there is a connection disconnect (may be this can be the approach that lock recipe could use to protect ?) 
>>>> Many Thanks,
>>>> Viswa.
>>>>  
>>>> From: Jordan Zimmerman <jordan@jordanzimmerman.com <ma...@jordanzimmerman.com>>
>>>> Date: Tuesday, 20 July 2021 at 21:11
>>>> To: Viswanathan Rajagopal <viswanathan.rajagopa@workday.com <ma...@workday.com>>
>>>> Cc: Sean Gibbons <sean.gibbons@workday.com <ma...@workday.com>>, dev@curator.apache.org <ma...@curator.apache.org> <dev@curator.apache.org <ma...@curator.apache.org>>, cammckenzie@apache.org <ma...@apache.org> <cammckenzie@apache.org <ma...@apache.org>>, user@curator.apache.org <ma...@curator.apache.org> <user@curator.apache.org <ma...@curator.apache.org>>, Donal Arundel <donal.arundel@workday.com <ma...@workday.com>>, Francisco Goncalves <francisco.goncalves@workday.com <ma...@workday.com>>, Zak Szadowski <zak.szadowski@workday.com <ma...@workday.com>>, Dandie Beloy <dandie.beloy@workday.com <ma...@workday.com>>, Marcelo Cenerino <marcelo.cenerino@workday.com <ma...@workday.com>>
>>>> Subject: Re: [External Sender] Double Locking Issue
>>>> 
>>>> In our case, it wouldn’t throw any exception because it had gone past “creating lock nodes” and was blocked on wait(), which would only then be interrupted when curator watcher notified on previous sequence node delete event.
>>>>  
>>>> So, you're using the version of acquire() without a timeout? In any event, this is a problem. When you receive SUSPENDED you really should interrupt any threads that are waiting on Curator. The Curator docs imply this even though it might not be obvious. This is likely the source of your problems. A simple solution is to use the version of acquire that has a timeout and repeatedly call it until success (though your problem may still occur in this case). Maybe we could improve the lock recipe (or something similar) so that locks inside of acquire are interrupted on a network partition. 
>>>>  
>>>> [snip of your remaining thoughtful analysis]
>>>>  
>>>> I think there's generally a hole in Curator's lock recipe. The lock code does not watch the lock node it has created. So, another process or (as you found) a race with the server might cause the lock node to disappear underneath the lock instance after it thinks it has the lock. 
>>>>  
>>>> One thing we can do is to check the session ID before waiting in LockInternals.internalLockLoop(). If the session ID changes after wait exits it suggests the lock node should no longer be trusted and it should be deleted and re-created. That's 1 though anyway. It would be nice to generate a test/simulation for this case so that it can be properly dealt with.
>>>>  
>>>> -Jordan
>>>> 
>>>> 
>>>> On Jul 20, 2021, at 3:53 PM, Viswanathan Rajagopal <viswanathan.rajagopa@workday.com <ma...@workday.com>> wrote:
>>>>  
>>>> Thanks Jordan for coming back on this.
>>>>  
>>>> Please find my inline comments. Also provided below few additional version info that we are using,
>>>> Curator version – 2.5.0
>>>> Zookeeper version – 3.5.3
>>>>  
>>>> >  I don't see how this is the case. If the client has received a network partition it shouldn't not consider any locks as being currently held (see the Tech Notes I mentioned in my previous email). If there is a partition during a call to acquire(), acquire() would throw an exception (once the retry policy has expired). BTW, what retry policy are you using?
>>>>  
>>>> In our case, it wouldn’t throw any exception because it had gone past “creating lock nodes” and was blocked on wait(), which would only then be interrupted when curator watcher notified on previous sequence node delete event.
>>>>  
>>>> > This isn't correct. Curator watches the previous node but lock acquisition is always based on calling ZK's getChildren, sorting them, and seeing if the lock's node is the first node in the sequence. If Ephemeral nodes aren't cleaned up it wouldn't be a problem. Any phantom ephemeral nodes would sort first and prevent Curator from believing it holds the lock.
>>>>  
>>>> Totally agreed on your point on lock acquisition. The problem that I was trying to explain is that the network partition occurs while the client is waiting for the lock (in the acquire()) but after the acquire code block has already written out the lock node successfully, so the client code will be blocking in the acquire and not get a connection exception. Once the previous lock node is deleted but not the lock node for this client then this client will leave the acquire thinking it has the lock but actually at this point the ZK server has expired the session and is in the process of deleting our clients lock node.
>>>>  
>>>> Trying to categorize the timeline events below to see if that would help us for better understanding,
>>>> [ CONNECTED ] :
>>>> Client A & Client B calls acquire creating Node N1 & N2 respectively
>>>> Client A acquire() -> holds the lock as its Node N1 is first node in sequence
>>>> Client B acquire() -> created Node N2, watches for previous sequence node (blocked on wait())
>>>>  
>>>> [ SUSPENDED CONNECTION OCCURS ] :
>>>> Network partition happens
>>>> Curator fires SUSPENDED
>>>> Client A, on receiving SUSPENDED event, attempts to release the lock
>>>> Client B, on receiving SUSPENDED event, does nothing as it was not holding any lock (as blocked on acquire() -> wait() )  
>>>>  
>>>> [ SESSION EXPIRED ] :
>>>> Client A attempts to release the lock (with retries)
>>>> Client B does nothing as it was not holding any lock (and is still blocked on acquire() -> wait() )
>>>> Server prepared to cleanup previous client session and deleting lock nodes created as part of previous client session
>>>>  
>>>> [ RECONNECTED ] :
>>>> Client A and Client B reconnected to another server with new session id (Note : This happens before server managed to cleanup / delete ephemeral nodes of previous client session)
>>>> Client A released the lock successfully (means it would delete its lock node N1) and attempts to acquire lock by creating lock node N3 and watches for previous sequence node (N2)
>>>> Client B who was blocked on acquire() -> wait() would be notified with previous sequence node (N1) deletion -> getChildren, sorting them, and seeing if the lock's node is the first node in the sequence. So, Client B sees its lock node N2 still (which I call it as about to be deleted node by server) and thereby acquires the lock
>>>>  
>>>> [ AFTER FEW SECONDS ] :
>>>> Server managed to delete the ephemeral node N2 as part of previous client session cleanup
>>>> Client A who was blocked on acquire() -> wait() would be notified with previous sequence node (N2) deletion -> getChildren, sorting them, and seeing if the lock's node is the first node in the sequence and thereby acquires the lock
>>>> Client B – its local lock thread data went stale (as its lock path N2 not has been deleted by Server)
>>>>  
>>>> >  SUSPENDED in Curator means only that the connect has been lost, not that the session has ended.
>>>> LOST is the state that means the session has ended
>>>> Be aware of how GCs can affect Curator. See the Tech Note here: https://cwiki.apache.org/confluence/display/CURATOR/TN10 <https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CURATOR_TN10&d=DwMFaQ&c=DS6PUFBBr_KiLo7Sjt3ljp5jaW5k2i9ijVXllEdOozc&r=mwSLlPO0Vtstmu1dce0TMFqf5lUxD2SPdNdc1k4NXjU&m=jQttOc88vMYbwNBFzWUO8TKGSmu_8P6EDmcD7jlrVys&s=BJywW80sEBAxOQEuTDkeP8tx0XX0s01sz44lNyWXIXs&e=>
>>>> <https://cwiki.apache.org/confluence/display/CURATOR/TN10 <https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CURATOR_TN10&d=DwMFaQ&c=DS6PUFBBr_KiLo7Sjt3ljp5jaW5k2i9ijVXllEdOozc&r=mwSLlPO0Vtstmu1dce0TMFqf5lUxD2SPdNdc1k4NXjU&m=jQttOc88vMYbwNBFzWUO8TKGSmu_8P6EDmcD7jlrVys&s=BJywW80sEBAxOQEuTDkeP8tx0XX0s01sz44lNyWXIXs&e=>>
>>>> Also read this Tech Note on session handling: https://cwiki.apache.org/confluence/display/CURATOR/TN14 <https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CURATOR_TN14&d=DwMFaQ&c=DS6PUFBBr_KiLo7Sjt3ljp5jaW5k2i9ijVXllEdOozc&r=mwSLlPO0Vtstmu1dce0TMFqf5lUxD2SPdNdc1k4NXjU&m=jQttOc88vMYbwNBFzWUO8TKGSmu_8P6EDmcD7jlrVys&s=UhQP5TNa_OMaYVSJX3b9p7XqWZ0fZtErLQwwbcQUUQA&e=>
>>>> <https://cwiki.apache.org/confluence/display/CURATOR/TN14 <https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CURATOR_TN14&d=DwMFaQ&c=DS6PUFBBr_KiLo7Sjt3ljp5jaW5k2i9ijVXllEdOozc&r=mwSLlPO0Vtstmu1dce0TMFqf5lUxD2SPdNdc1k4NXjU&m=jQttOc88vMYbwNBFzWUO8TKGSmu_8P6EDmcD7jlrVys&s=UhQP5TNa_OMaYVSJX3b9p7XqWZ0fZtErLQwwbcQUUQA&e=>>
>>>>  
>>>> Very good information on the tech notes. Thanks for that. Agreed, it’s always recommended to clear the locks when it sees SUSPENDED event. And we are already following your recommendation. Our application would clear the lock when it sees SUSPENDED event.
>>>>  
>>>> To summarize my thoughts,
>>>> Ephemeral node created by previous session was  still seen by client that reconnected with new session id until server cleans that up the previous session ephemeral node. This could happen if client manage to reconnect with server with new session id before server cleans up the previous session. How it affects the curator lock recipe ? Explanation : The above explained race condition would make acquire() to hold the lock ( as its own lock node still seen ), eventually leading to inconsistent state (i.e. curator local lock state stale) when that lock node is being cleaned up by the server as part of previous session cleanup activities.
>>>> I am NOT seeing a Curator bug here, but looking out for suggestions / recommendations in handling this zookeeper race condition. Either can a feature be added to Curator to cover this case / any recommendations that clients should follow.
>>>>  
>>>> Many Thanks,
>>>> Viswa.
>>>>  
>>>> From: Sean Gibbons <sean.gibbons@workday.com <ma...@workday.com>>
>>>> Date: Tuesday, 20 July 2021 at 12:19
>>>> To: Viswanathan Rajagopal <viswanathan.rajagopa@workday.com <ma...@workday.com>>
>>>> Subject: FW: [External Sender] Re: Double Locking Issue
>>>> 
>>>> 
>>>> 
>>>> On 20/07/2021, 10:12, "Jordan Zimmerman" <jordan@jordanzimmerman.com <ma...@jordanzimmerman.com>> wrote:
>>>> 
>>>>     A few more things...
>>>> 
>>>>     > Based on our initial analysis and few test runs, we saw that Curator acquire() method acquires the lock based on “about to be deleted lock node of previous session”. Explanation : Ephemeral node created by previous session was  still seen by client that reconnected with new session id until server cleans that up. If this happens, Curator acquire() would hold the lock.
>>>> 
>>>>     This isn't correct. Curator watches the previous node but lock acquisition is always based on calling ZK's getChildren, sorting them, and seeing if the lock's node is the first node in the sequence. If Ephemeral nodes aren't cleaned up it wouldn't be a problem. Any phantom ephemeral nodes would sort first and prevent Curator from believing it holds the lock.
>>>> 
>>>>     >  *   On the above mentioned race condition, if client manage to reconnect to server with new session id before server cleans up the ephemeral nodes of client’s previous session,  Curator lock acquire() who is trying to acquire the lock will hold the lock as it still sees the lock node in zookeeper directory. Eventually server would be cleaning up the ephemeral nodes leaving the Curator local lock thread data stale giving the illusion that it still hold the lock while its ephemeral node is gone
>>>> 
>>>>     I don't see how this is the case. If the client has received a network partition it shouldn't not consider any locks as being currently held (see the Tech Notes I mentioned in my previous email). If there is a partition during a call to acquire(), acquire() would throw an exception (once the retry policy has expired). BTW, what retry policy are you using?
>>>> 
>>>>     So, to reiterate, I don't see how phantom/undeleted ephemeral nodes would cause a problem. The only problem it could case is that a given Curator client takes longer to acquire a lock as it waits for those ephemerals to finally get deleted.
>>>> 
>>>>     -JZ
>>>> 
>>>>     > On Jul 19, 2021, at 6:45 PM, Viswanathan Rajagopal <viswanathan.rajagopa@workday.com.INVALID <ma...@workday.com.INVALID>> wrote:
>>>>     > 
>>>>     > Hi Team,
>>>>     > 
>>>>     > Good day.
>>>>     > 
>>>>     > Recently came across “Double Locking Issue (i.e. two clients acquiring lock)” using Curator code ( InterProcessMutex lock APIs ) in our application
>>>>     > 
>>>>     > Our use case:
>>>>     > 
>>>>     >  *   Two clients attempts to acquire the zookeeper lock using Curator InterProcessMutex and whoever owns it would release it once sees the connection disconnect ( on receiving Connection.SUSPENDED / Connection.LOST Curator Connection Events from Connection listener)
>>>>     > 
>>>>     > Issue we noticed:
>>>>     > 
>>>>     >  *   After session expired & reconnected with new session, both client seems to have acquired the lock. Interesting thing that we found is that one of the clients still holds the lock while its lock node (ephemeral) was gone
>>>>     > 
>>>>     > Things we found:
>>>>     > 
>>>>     >  *   Based on our initial analysis and few test runs, we saw that Curator acquire() method acquires the lock based on “about to be deleted lock node of previous session”. Explanation : Ephemeral node created by previous session was  still seen by client that reconnected with new session id until server cleans that up. If this happens, Curator acquire() would hold the lock.
>>>>     > 
>>>>     > 
>>>>     > 
>>>>     >  *   Clearly we could see the race condition (in zookeeper code) between 1). Client reconnecting to server with new session id and 2). server deleting the ephemeral nodes of client’s previous session. We were able to reproduce this issue using the following approach,
>>>>     >     *   Artificially break the socket connection between client and server for 30s
>>>>     >     *   Artificially pausing the set of server codes for a min and unpause
>>>>     > 
>>>>     > 
>>>>     >  *   On the above mentioned race condition, if client manage to reconnect to server with new session id before server cleans up the ephemeral nodes of client’s previous session,  Curator lock acquire() who is trying to acquire the lock will hold the lock as it still sees the lock node in zookeeper directory. Eventually server would be cleaning up the ephemeral nodes leaving the Curator local lock thread data stale giving the illusion that it still hold the lock while its ephemeral node is gone
>>>>     > 
>>>>     > 
>>>>     >  *   Timeline events described below for better understanding,
>>>>     >     *   At t1, Client A and Client B establishes zookeeper session with session id A1 and B1 respectively
>>>>     >     *   At t2, Client A creates the lock node N1 & acquires the lock
>>>>     >     *   At t3, Client B creates the lock node N2 & blocked on acquire() to acquire the lock
>>>>     >     *   At t4, session timed out for both clients & server is about to clean up the old session • Client A trying to release the lock
>>>>     >     *   At t5, Client A and Client B reconnects to server with new session id A2 and B2 respectively before server deletes the ephemeral node N1 & N2 of previous client session. Client A releases the lock, deleting N1 and trying to acquire it again by creating N3 node and Client B who is blocked on acquire() acquires the lock based on N2 (about to be deleted node created by previous session)
>>>>     >     *   At  t6, server cleans up the ephemeral node N2 created by Client B’s previous session. Client A acquires the lock with node N3 as its previous sequence N2 gets deleted, whereas Client B who incorrectly acquired the lock at t5 timeline still holds the lock
>>>>     > 
>>>>     > 
>>>>     > Note:
>>>>     > 
>>>>     >  *   We are not certain that this race condition that we noticed in zookeeper code is intentional design.
>>>>     > 
>>>>     > Questions:
>>>>     > 
>>>>     >  *   Given this race condition seen in zookeeper code, we would like to hear your recommendations / suggestions to avoid this issue while using Curator lock code?
>>>>     > 
>>>>     > 
>>>>     > 
>>>>     >  *   We also see that Interprocess Mutex has the makeRevocable() API that enables application to revoke lock, but that handles Node change event only but not for Node deleted event. I understand that this makes sense to handle Node change event alone, as to enable application user to revoke lock externally from application side. But would it be also to okay to have one for Node delete event, so as application can register the listener for Node delete event. I would like to hear your thoughts.
>>>>     > 
>>>>     > 
>>>>     > Many Thanks,
>>>>     > Viswa
>>>> 
>>> 
>> 
>

Re: [External Sender] Double Locking Issue

Posted by Jordan Zimmerman <jo...@jordanzimmerman.com>.

I'm now wondering if this is a ZooKeeper bug? In your original timeline you say that Client B sees its original ephemeral node that should have been deleted. ZooKeeper is taking time to delete that ZNode? A simple fix would be to try to write something to that ZNode. I would hope that ZooKeeper would reject the write on the now expired ZNode. So, the lock recipe would be enhanced so that it tries to do a setData on the node before it accepts that is has the lock. Just a thought.

> On Jul 30, 2021, at 9:54 AM, Jordan Zimmerman <jo...@jordanzimmerman.com> wrote:
> 
> Even more... actually, this should already work right? It's been a long time since I looked at this code so I wrote a quick test. wait() will already exits on Connection loss. I apologize, but what I said below is not correct. The LockInternals watcher gets called on connection loss and interrupts the thread already via notifyAll(). So, again, sorry for the noise.
> 
> I'll go over the scenario again to see if I have more comments.
> 
> -Jordan
> 
>> On Jul 30, 2021, at 9:18 AM, Jordan Zimmerman <jordan@jordanzimmerman.com <ma...@jordanzimmerman.com>> wrote:
>> 
>> Actually... I'm looking at LockInternals and it already has a watcher set on the previous node before it calls wait(). That watcher will get called when there's a connection problem. It would be pretty easy to add something in that watcher to interrupt the waiting thread. This could be a pretty simple change actually.
>> 
>> Watcher set just before waiting: https://github.com/apache/curator/blob/master/curator-recipes/src/main/java/org/apache/curator/framework/recipes/locks/LockInternals.java#L301 <https://github.com/apache/curator/blob/master/curator-recipes/src/main/java/org/apache/curator/framework/recipes/locks/LockInternals.java#L301>
>> The Watcher: https://github.com/apache/curator/blob/master/curator-recipes/src/main/java/org/apache/curator/framework/recipes/locks/LockInternals.java#L64 <https://github.com/apache/curator/blob/master/curator-recipes/src/main/java/org/apache/curator/framework/recipes/locks/LockInternals.java#L64>
>> 
>> So, off the top of my head, an AtomicReference<Thread> field could be added to LockInternals. It gets set to the current thread just before waiting. If the Watcher is called with a connection event and the AtomicReference isn't empty, interrupt the thread. This will cause wait() to throw InterruptedException and then the lock node will be deleted.
>> 
>> -Jordan
>> 
>>> On Jul 30, 2021, at 9:11 AM, Jordan Zimmerman <jordan@jordanzimmerman.com <ma...@jordanzimmerman.com>> wrote:
>>> 
>>> Making lock code watch its lock node it has created
>>> 
>>> I don't think this is useful. Once the lock method returns the lock code can't notify the caller/user that the lock node has disappeared. We'd need some kind of additional class "lock watcher" or something that client code would have to periodically call to ensure that it still has the lock. tbh - this has been discussed a lot over the years on the ZooKeeper/Curator channels. Just because you have a ZNode doesn't mean it's 100% valid. You should still practice optimistic locks, etc. But, maybe we can reduce the edge cases with this lock watcher or some other mechanism.
>>> 
>>> Session ID changes after wait exits it suggests the lock node should no longer be trusted and it should be deleted and re-created
>>> 
>>> This is a simple change. But, I don't know if it would help much. It would be better to interrupt waiting lockers when the connection is lost. If an internal mechanism forced any locks blocked in wait() to throw an exception then those lock threads will already delete their ZNodes and try again. This would be the best solution maybe? So, before the lock code goes to wait() it sets a ConnectionStateListener (or something similar) that will interrupt the thread when there are connection problems.
>>> 
>>> Leader Latch code is well protected
>>> 
>>> Yes - the leader recipes are better at this. Maybe there's an opportunity to merge/change code. We could consider deprecating the lock recipes in favor of these?
>>> 
>>> -Jordan
>>> 
>>>> On Jul 29, 2021, at 2:35 AM, Viswanathan Rajagopal <viswanathan.rajagopa@workday.com <ma...@workday.com>> wrote:
>>>> 
>>>> Hi Jordan,
>>>>  
>>>> Thanks for the suggestions. Much helpful.
>>>>  
>>>> Quick summary of my below comments,
>>>> Added test case simulator to reproduce issue (though artificially)
>>>> Added a proposed code just to get your review and suggestions to see whether that would work
>>>> Please find my detailed comments inline,
>>>>  
>>>> > I think there's generally a hole in Curator's lock recipe. The lock code does not watch the lock node it has created. So, another process or (as you found) a race with the server might cause the lock node to disappear underneath the lock instance after it thinks it has the lock. One thing we can do is to check the session ID before waiting in LockInternals.internalLockLoop(). If the session ID changes after wait exits it suggests the lock node should no longer be trusted and it should be deleted and re-created. That's 1 though anyway.
>>>>  
>>>> Yes, you are right. Currently lock code doesn’t watch the lock node it has created. That’s where we would like to hear your feedback on our proposal mentioned in our first mail thread 
>>>> “Curator lock code has makeRevocable() API that enables application to revoke lock anytime from application by triggering NODE_CHANGE event through Revoker.attemptToRevoke() utility. 
>>>> Proposal: Would it be nice to extend makeRevocable() API to handle Node delete event, which would allow application to register the watcher for Node delete event, thereby application can react to Node delete event by revoking the lock ?”. Tried the Code snippet (https://github.com/ViswaNXplore/curator/blob/curator-2.5.0/patch/double_locking_test_simulator/curator-recipes/src/main/java/org/apache/curator/framework/recipes/locks/LockInternals.java#L60 <https://github.com/ViswaNXplore/curator/blob/curator-2.5.0/patch/double_locking_test_simulator/curator-recipes/src/main/java/org/apache/curator/framework/recipes/locks/LockInternals.java#L60>
>>>> ) to see the approach “Making lock code watch its lock node it has created” works and would like to get your thoughts on this and do you see any other impacts there? May need your advice.
>>>>  
>>>> I agree with you on that the session id approach, that would protect us from seeing this issue. The reason why I thought your first approach “making lock code watch it has created” could be more protective is this could protect us from any inconsistencies between original lock node and local lock state.
>>>>   
>>>> >  It would be nice to generate a test/simulation for this case so that it can be properly dealt with.
>>>>  
>>>> Created a test case simulator https://github.com/ViswaNXplore/curator/blob/curator-2.5.0/patch/double_locking_test_simulator/curator-recipes/src/test/java/org/apache/curator/framework/recipes/CuratorRecipeTest.java <https://github.com/ViswaNXplore/curator/blob/curator-2.5.0/patch/double_locking_test_simulator/curator-recipes/src/test/java/org/apache/curator/framework/recipes/CuratorRecipeTest.java>
>>>>  
>>>> Approach #1 (Artificial way of reproducing the issue)
>>>> Run the test case with (SIMULATE_ISSUE_BY_EXPLICIT_DELETE set to true). This would delete the lock node after a pause just to simulate the state where the zkClient had reconnected and it still happens to see the ephermal node just before the server deletes it since its session has expired, but the node is deleted afterwards by the server.
>>>> Approach #2 (A little manual interruption needed to reproduce the issue)
>>>> Run the test case in Debug mode (SIMULATE_ISSUE_BY_EXPLICIT_DELETE set to false)
>>>> Artificially delaying / pausing the ephemeral lock nodes deletion as part of session cleanup process in server code (ZookeeperServer.close() method)
>>>> After a pause (say 5s) to make one of the instance to acquire the lock, Artificially break the socket connection between client and server for 30s (by keeping breakpoint in ClientCnxnSocketNIO.doIO() method). After 30s, we would see session closing logs logged in server code
>>>> After 1min, remove breakpoint in ClientCnxnSocketNIO.doIO() and resume both Thread 2 and Thread 3
>>>> After that, resume server thread (thread name would be “SessionTracker”
>>>> Below Proposals discussed so far (3rd added now for your review)
>>>> Making lock code watch its lock node it has created
>>>> Session ID changes after wait exits it suggests the lock node should no longer be trusted and it should be deleted and re-created
>>>> Leader Latch code is well protected to cover this zookeeper race condition, because Leader Latch code internally handle the connection events (which they use to interrupt latch_acquire_state to reset its latch every time connection is reconnected), means it will explicitly release the latch and recreate new if there is a connection disconnect (may be this can be the approach that lock recipe could use to protect ?) 
>>>> Many Thanks,
>>>> Viswa.
>>>>  
>>>> From: Jordan Zimmerman <jordan@jordanzimmerman.com <ma...@jordanzimmerman.com>>
>>>> Date: Tuesday, 20 July 2021 at 21:11
>>>> To: Viswanathan Rajagopal <viswanathan.rajagopa@workday.com <ma...@workday.com>>
>>>> Cc: Sean Gibbons <sean.gibbons@workday.com <ma...@workday.com>>, dev@curator.apache.org <ma...@curator.apache.org> <dev@curator.apache.org <ma...@curator.apache.org>>, cammckenzie@apache.org <ma...@apache.org> <cammckenzie@apache.org <ma...@apache.org>>, user@curator.apache.org <ma...@curator.apache.org> <user@curator.apache.org <ma...@curator.apache.org>>, Donal Arundel <donal.arundel@workday.com <ma...@workday.com>>, Francisco Goncalves <francisco.goncalves@workday.com <ma...@workday.com>>, Zak Szadowski <zak.szadowski@workday.com <ma...@workday.com>>, Dandie Beloy <dandie.beloy@workday.com <ma...@workday.com>>, Marcelo Cenerino <marcelo.cenerino@workday.com <ma...@workday.com>>
>>>> Subject: Re: [External Sender] Double Locking Issue
>>>> 
>>>> In our case, it wouldn’t throw any exception because it had gone past “creating lock nodes” and was blocked on wait(), which would only then be interrupted when curator watcher notified on previous sequence node delete event.
>>>>  
>>>> So, you're using the version of acquire() without a timeout? In any event, this is a problem. When you receive SUSPENDED you really should interrupt any threads that are waiting on Curator. The Curator docs imply this even though it might not be obvious. This is likely the source of your problems. A simple solution is to use the version of acquire that has a timeout and repeatedly call it until success (though your problem may still occur in this case). Maybe we could improve the lock recipe (or something similar) so that locks inside of acquire are interrupted on a network partition. 
>>>>  
>>>> [snip of your remaining thoughtful analysis]
>>>>  
>>>> I think there's generally a hole in Curator's lock recipe. The lock code does not watch the lock node it has created. So, another process or (as you found) a race with the server might cause the lock node to disappear underneath the lock instance after it thinks it has the lock. 
>>>>  
>>>> One thing we can do is to check the session ID before waiting in LockInternals.internalLockLoop(). If the session ID changes after wait exits it suggests the lock node should no longer be trusted and it should be deleted and re-created. That's 1 though anyway. It would be nice to generate a test/simulation for this case so that it can be properly dealt with.
>>>>  
>>>> -Jordan
>>>> 
>>>> 
>>>> On Jul 20, 2021, at 3:53 PM, Viswanathan Rajagopal <viswanathan.rajagopa@workday.com <ma...@workday.com>> wrote:
>>>>  
>>>> Thanks Jordan for coming back on this.
>>>>  
>>>> Please find my inline comments. Also provided below few additional version info that we are using,
>>>> Curator version – 2.5.0
>>>> Zookeeper version – 3.5.3
>>>>  
>>>> >  I don't see how this is the case. If the client has received a network partition it shouldn't not consider any locks as being currently held (see the Tech Notes I mentioned in my previous email). If there is a partition during a call to acquire(), acquire() would throw an exception (once the retry policy has expired). BTW, what retry policy are you using?
>>>>  
>>>> In our case, it wouldn’t throw any exception because it had gone past “creating lock nodes” and was blocked on wait(), which would only then be interrupted when curator watcher notified on previous sequence node delete event.
>>>>  
>>>> > This isn't correct. Curator watches the previous node but lock acquisition is always based on calling ZK's getChildren, sorting them, and seeing if the lock's node is the first node in the sequence. If Ephemeral nodes aren't cleaned up it wouldn't be a problem. Any phantom ephemeral nodes would sort first and prevent Curator from believing it holds the lock.
>>>>  
>>>> Totally agreed on your point on lock acquisition. The problem that I was trying to explain is that the network partition occurs while the client is waiting for the lock (in the acquire()) but after the acquire code block has already written out the lock node successfully, so the client code will be blocking in the acquire and not get a connection exception. Once the previous lock node is deleted but not the lock node for this client then this client will leave the acquire thinking it has the lock but actually at this point the ZK server has expired the session and is in the process of deleting our clients lock node.
>>>>  
>>>> Trying to categorize the timeline events below to see if that would help us for better understanding,
>>>> [ CONNECTED ] :
>>>> Client A & Client B calls acquire creating Node N1 & N2 respectively
>>>> Client A acquire() -> holds the lock as its Node N1 is first node in sequence
>>>> Client B acquire() -> created Node N2, watches for previous sequence node (blocked on wait())
>>>>  
>>>> [ SUSPENDED CONNECTION OCCURS ] :
>>>> Network partition happens
>>>> Curator fires SUSPENDED
>>>> Client A, on receiving SUSPENDED event, attempts to release the lock
>>>> Client B, on receiving SUSPENDED event, does nothing as it was not holding any lock (as blocked on acquire() -> wait() )  
>>>>  
>>>> [ SESSION EXPIRED ] :
>>>> Client A attempts to release the lock (with retries)
>>>> Client B does nothing as it was not holding any lock (and is still blocked on acquire() -> wait() )
>>>> Server prepared to cleanup previous client session and deleting lock nodes created as part of previous client session
>>>>  
>>>> [ RECONNECTED ] :
>>>> Client A and Client B reconnected to another server with new session id (Note : This happens before server managed to cleanup / delete ephemeral nodes of previous client session)
>>>> Client A released the lock successfully (means it would delete its lock node N1) and attempts to acquire lock by creating lock node N3 and watches for previous sequence node (N2)
>>>> Client B who was blocked on acquire() -> wait() would be notified with previous sequence node (N1) deletion -> getChildren, sorting them, and seeing if the lock's node is the first node in the sequence. So, Client B sees its lock node N2 still (which I call it as about to be deleted node by server) and thereby acquires the lock
>>>>  
>>>> [ AFTER FEW SECONDS ] :
>>>> Server managed to delete the ephemeral node N2 as part of previous client session cleanup
>>>> Client A who was blocked on acquire() -> wait() would be notified with previous sequence node (N2) deletion -> getChildren, sorting them, and seeing if the lock's node is the first node in the sequence and thereby acquires the lock
>>>> Client B – its local lock thread data went stale (as its lock path N2 not has been deleted by Server)
>>>>  
>>>> >  SUSPENDED in Curator means only that the connect has been lost, not that the session has ended.
>>>> LOST is the state that means the session has ended
>>>> Be aware of how GCs can affect Curator. See the Tech Note here: https://cwiki.apache.org/confluence/display/CURATOR/TN10 <https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CURATOR_TN10&d=DwMFaQ&c=DS6PUFBBr_KiLo7Sjt3ljp5jaW5k2i9ijVXllEdOozc&r=mwSLlPO0Vtstmu1dce0TMFqf5lUxD2SPdNdc1k4NXjU&m=jQttOc88vMYbwNBFzWUO8TKGSmu_8P6EDmcD7jlrVys&s=BJywW80sEBAxOQEuTDkeP8tx0XX0s01sz44lNyWXIXs&e=>
>>>> <https://cwiki.apache.org/confluence/display/CURATOR/TN10 <https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CURATOR_TN10&d=DwMFaQ&c=DS6PUFBBr_KiLo7Sjt3ljp5jaW5k2i9ijVXllEdOozc&r=mwSLlPO0Vtstmu1dce0TMFqf5lUxD2SPdNdc1k4NXjU&m=jQttOc88vMYbwNBFzWUO8TKGSmu_8P6EDmcD7jlrVys&s=BJywW80sEBAxOQEuTDkeP8tx0XX0s01sz44lNyWXIXs&e=>>
>>>> Also read this Tech Note on session handling: https://cwiki.apache.org/confluence/display/CURATOR/TN14 <https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CURATOR_TN14&d=DwMFaQ&c=DS6PUFBBr_KiLo7Sjt3ljp5jaW5k2i9ijVXllEdOozc&r=mwSLlPO0Vtstmu1dce0TMFqf5lUxD2SPdNdc1k4NXjU&m=jQttOc88vMYbwNBFzWUO8TKGSmu_8P6EDmcD7jlrVys&s=UhQP5TNa_OMaYVSJX3b9p7XqWZ0fZtErLQwwbcQUUQA&e=>
>>>> <https://cwiki.apache.org/confluence/display/CURATOR/TN14 <https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CURATOR_TN14&d=DwMFaQ&c=DS6PUFBBr_KiLo7Sjt3ljp5jaW5k2i9ijVXllEdOozc&r=mwSLlPO0Vtstmu1dce0TMFqf5lUxD2SPdNdc1k4NXjU&m=jQttOc88vMYbwNBFzWUO8TKGSmu_8P6EDmcD7jlrVys&s=UhQP5TNa_OMaYVSJX3b9p7XqWZ0fZtErLQwwbcQUUQA&e=>>
>>>>  
>>>> Very good information on the tech notes. Thanks for that. Agreed, it’s always recommended to clear the locks when it sees SUSPENDED event. And we are already following your recommendation. Our application would clear the lock when it sees SUSPENDED event.
>>>>  
>>>> To summarize my thoughts,
>>>> Ephemeral node created by previous session was  still seen by client that reconnected with new session id until server cleans that up the previous session ephemeral node. This could happen if client manage to reconnect with server with new session id before server cleans up the previous session. How it affects the curator lock recipe ? Explanation : The above explained race condition would make acquire() to hold the lock ( as its own lock node still seen ), eventually leading to inconsistent state (i.e. curator local lock state stale) when that lock node is being cleaned up by the server as part of previous session cleanup activities.
>>>> I am NOT seeing a Curator bug here, but looking out for suggestions / recommendations in handling this zookeeper race condition. Either can a feature be added to Curator to cover this case / any recommendations that clients should follow.
>>>>  
>>>> Many Thanks,
>>>> Viswa.
>>>>  
>>>> From: Sean Gibbons <sean.gibbons@workday.com <ma...@workday.com>>
>>>> Date: Tuesday, 20 July 2021 at 12:19
>>>> To: Viswanathan Rajagopal <viswanathan.rajagopa@workday.com <ma...@workday.com>>
>>>> Subject: FW: [External Sender] Re: Double Locking Issue
>>>> 
>>>> 
>>>> 
>>>> On 20/07/2021, 10:12, "Jordan Zimmerman" <jordan@jordanzimmerman.com <ma...@jordanzimmerman.com>> wrote:
>>>> 
>>>>     A few more things...
>>>> 
>>>>     > Based on our initial analysis and few test runs, we saw that Curator acquire() method acquires the lock based on “about to be deleted lock node of previous session”. Explanation : Ephemeral node created by previous session was  still seen by client that reconnected with new session id until server cleans that up. If this happens, Curator acquire() would hold the lock.
>>>> 
>>>>     This isn't correct. Curator watches the previous node but lock acquisition is always based on calling ZK's getChildren, sorting them, and seeing if the lock's node is the first node in the sequence. If Ephemeral nodes aren't cleaned up it wouldn't be a problem. Any phantom ephemeral nodes would sort first and prevent Curator from believing it holds the lock.
>>>> 
>>>>     >  *   On the above mentioned race condition, if client manage to reconnect to server with new session id before server cleans up the ephemeral nodes of client’s previous session,  Curator lock acquire() who is trying to acquire the lock will hold the lock as it still sees the lock node in zookeeper directory. Eventually server would be cleaning up the ephemeral nodes leaving the Curator local lock thread data stale giving the illusion that it still hold the lock while its ephemeral node is gone
>>>> 
>>>>     I don't see how this is the case. If the client has received a network partition it shouldn't not consider any locks as being currently held (see the Tech Notes I mentioned in my previous email). If there is a partition during a call to acquire(), acquire() would throw an exception (once the retry policy has expired). BTW, what retry policy are you using?
>>>> 
>>>>     So, to reiterate, I don't see how phantom/undeleted ephemeral nodes would cause a problem. The only problem it could case is that a given Curator client takes longer to acquire a lock as it waits for those ephemerals to finally get deleted.
>>>> 
>>>>     -JZ
>>>> 
>>>>     > On Jul 19, 2021, at 6:45 PM, Viswanathan Rajagopal <viswanathan.rajagopa@workday.com.INVALID <ma...@workday.com.INVALID>> wrote:
>>>>     > 
>>>>     > Hi Team,
>>>>     > 
>>>>     > Good day.
>>>>     > 
>>>>     > Recently came across “Double Locking Issue (i.e. two clients acquiring lock)” using Curator code ( InterProcessMutex lock APIs ) in our application
>>>>     > 
>>>>     > Our use case:
>>>>     > 
>>>>     >  *   Two clients attempts to acquire the zookeeper lock using Curator InterProcessMutex and whoever owns it would release it once sees the connection disconnect ( on receiving Connection.SUSPENDED / Connection.LOST Curator Connection Events from Connection listener)
>>>>     > 
>>>>     > Issue we noticed:
>>>>     > 
>>>>     >  *   After session expired & reconnected with new session, both client seems to have acquired the lock. Interesting thing that we found is that one of the clients still holds the lock while its lock node (ephemeral) was gone
>>>>     > 
>>>>     > Things we found:
>>>>     > 
>>>>     >  *   Based on our initial analysis and few test runs, we saw that Curator acquire() method acquires the lock based on “about to be deleted lock node of previous session”. Explanation : Ephemeral node created by previous session was  still seen by client that reconnected with new session id until server cleans that up. If this happens, Curator acquire() would hold the lock.
>>>>     > 
>>>>     > 
>>>>     > 
>>>>     >  *   Clearly we could see the race condition (in zookeeper code) between 1). Client reconnecting to server with new session id and 2). server deleting the ephemeral nodes of client’s previous session. We were able to reproduce this issue using the following approach,
>>>>     >     *   Artificially break the socket connection between client and server for 30s
>>>>     >     *   Artificially pausing the set of server codes for a min and unpause
>>>>     > 
>>>>     > 
>>>>     >  *   On the above mentioned race condition, if client manage to reconnect to server with new session id before server cleans up the ephemeral nodes of client’s previous session,  Curator lock acquire() who is trying to acquire the lock will hold the lock as it still sees the lock node in zookeeper directory. Eventually server would be cleaning up the ephemeral nodes leaving the Curator local lock thread data stale giving the illusion that it still hold the lock while its ephemeral node is gone
>>>>     > 
>>>>     > 
>>>>     >  *   Timeline events described below for better understanding,
>>>>     >     *   At t1, Client A and Client B establishes zookeeper session with session id A1 and B1 respectively
>>>>     >     *   At t2, Client A creates the lock node N1 & acquires the lock
>>>>     >     *   At t3, Client B creates the lock node N2 & blocked on acquire() to acquire the lock
>>>>     >     *   At t4, session timed out for both clients & server is about to clean up the old session • Client A trying to release the lock
>>>>     >     *   At t5, Client A and Client B reconnects to server with new session id A2 and B2 respectively before server deletes the ephemeral node N1 & N2 of previous client session. Client A releases the lock, deleting N1 and trying to acquire it again by creating N3 node and Client B who is blocked on acquire() acquires the lock based on N2 (about to be deleted node created by previous session)
>>>>     >     *   At  t6, server cleans up the ephemeral node N2 created by Client B’s previous session. Client A acquires the lock with node N3 as its previous sequence N2 gets deleted, whereas Client B who incorrectly acquired the lock at t5 timeline still holds the lock
>>>>     > 
>>>>     > 
>>>>     > Note:
>>>>     > 
>>>>     >  *   We are not certain that this race condition that we noticed in zookeeper code is intentional design.
>>>>     > 
>>>>     > Questions:
>>>>     > 
>>>>     >  *   Given this race condition seen in zookeeper code, we would like to hear your recommendations / suggestions to avoid this issue while using Curator lock code?
>>>>     > 
>>>>     > 
>>>>     > 
>>>>     >  *   We also see that Interprocess Mutex has the makeRevocable() API that enables application to revoke lock, but that handles Node change event only but not for Node deleted event. I understand that this makes sense to handle Node change event alone, as to enable application user to revoke lock externally from application side. But would it be also to okay to have one for Node delete event, so as application can register the listener for Node delete event. I would like to hear your thoughts.
>>>>     > 
>>>>     > 
>>>>     > Many Thanks,
>>>>     > Viswa
>>>> 
>>> 
>> 
>

Re: [External Sender] Double Locking Issue

Posted by Jordan Zimmerman <jo...@jordanzimmerman.com>.

Even more... actually, this should already work right? It's been a long time since I looked at this code so I wrote a quick test. wait() will already exits on Connection loss. I apologize, but what I said below is not correct. The LockInternals watcher gets called on connection loss and interrupts the thread already via notifyAll(). So, again, sorry for the noise.

I'll go over the scenario again to see if I have more comments.

-Jordan

> On Jul 30, 2021, at 9:18 AM, Jordan Zimmerman <jo...@jordanzimmerman.com> wrote:
> 
> Actually... I'm looking at LockInternals and it already has a watcher set on the previous node before it calls wait(). That watcher will get called when there's a connection problem. It would be pretty easy to add something in that watcher to interrupt the waiting thread. This could be a pretty simple change actually.
> 
> Watcher set just before waiting: https://github.com/apache/curator/blob/master/curator-recipes/src/main/java/org/apache/curator/framework/recipes/locks/LockInternals.java#L301 <https://github.com/apache/curator/blob/master/curator-recipes/src/main/java/org/apache/curator/framework/recipes/locks/LockInternals.java#L301>
> The Watcher: https://github.com/apache/curator/blob/master/curator-recipes/src/main/java/org/apache/curator/framework/recipes/locks/LockInternals.java#L64 <https://github.com/apache/curator/blob/master/curator-recipes/src/main/java/org/apache/curator/framework/recipes/locks/LockInternals.java#L64>
> 
> So, off the top of my head, an AtomicReference<Thread> field could be added to LockInternals. It gets set to the current thread just before waiting. If the Watcher is called with a connection event and the AtomicReference isn't empty, interrupt the thread. This will cause wait() to throw InterruptedException and then the lock node will be deleted.
> 
> -Jordan
> 
>> On Jul 30, 2021, at 9:11 AM, Jordan Zimmerman <jordan@jordanzimmerman.com <ma...@jordanzimmerman.com>> wrote:
>> 
>> Making lock code watch its lock node it has created
>> 
>> I don't think this is useful. Once the lock method returns the lock code can't notify the caller/user that the lock node has disappeared. We'd need some kind of additional class "lock watcher" or something that client code would have to periodically call to ensure that it still has the lock. tbh - this has been discussed a lot over the years on the ZooKeeper/Curator channels. Just because you have a ZNode doesn't mean it's 100% valid. You should still practice optimistic locks, etc. But, maybe we can reduce the edge cases with this lock watcher or some other mechanism.
>> 
>> Session ID changes after wait exits it suggests the lock node should no longer be trusted and it should be deleted and re-created
>> 
>> This is a simple change. But, I don't know if it would help much. It would be better to interrupt waiting lockers when the connection is lost. If an internal mechanism forced any locks blocked in wait() to throw an exception then those lock threads will already delete their ZNodes and try again. This would be the best solution maybe? So, before the lock code goes to wait() it sets a ConnectionStateListener (or something similar) that will interrupt the thread when there are connection problems.
>> 
>> Leader Latch code is well protected
>> 
>> Yes - the leader recipes are better at this. Maybe there's an opportunity to merge/change code. We could consider deprecating the lock recipes in favor of these?
>> 
>> -Jordan
>> 
>>> On Jul 29, 2021, at 2:35 AM, Viswanathan Rajagopal <viswanathan.rajagopa@workday.com <ma...@workday.com>> wrote:
>>> 
>>> Hi Jordan,
>>>  
>>> Thanks for the suggestions. Much helpful.
>>>  
>>> Quick summary of my below comments,
>>> Added test case simulator to reproduce issue (though artificially)
>>> Added a proposed code just to get your review and suggestions to see whether that would work
>>> Please find my detailed comments inline,
>>>  
>>> > I think there's generally a hole in Curator's lock recipe. The lock code does not watch the lock node it has created. So, another process or (as you found) a race with the server might cause the lock node to disappear underneath the lock instance after it thinks it has the lock. One thing we can do is to check the session ID before waiting in LockInternals.internalLockLoop(). If the session ID changes after wait exits it suggests the lock node should no longer be trusted and it should be deleted and re-created. That's 1 though anyway.
>>>  
>>> Yes, you are right. Currently lock code doesn’t watch the lock node it has created. That’s where we would like to hear your feedback on our proposal mentioned in our first mail thread 
>>> “Curator lock code has makeRevocable() API that enables application to revoke lock anytime from application by triggering NODE_CHANGE event through Revoker.attemptToRevoke() utility. 
>>> Proposal: Would it be nice to extend makeRevocable() API to handle Node delete event, which would allow application to register the watcher for Node delete event, thereby application can react to Node delete event by revoking the lock ?”. Tried the Code snippet (https://github.com/ViswaNXplore/curator/blob/curator-2.5.0/patch/double_locking_test_simulator/curator-recipes/src/main/java/org/apache/curator/framework/recipes/locks/LockInternals.java#L60 <https://github.com/ViswaNXplore/curator/blob/curator-2.5.0/patch/double_locking_test_simulator/curator-recipes/src/main/java/org/apache/curator/framework/recipes/locks/LockInternals.java#L60>
>>> ) to see the approach “Making lock code watch its lock node it has created” works and would like to get your thoughts on this and do you see any other impacts there? May need your advice.
>>>  
>>> I agree with you on that the session id approach, that would protect us from seeing this issue. The reason why I thought your first approach “making lock code watch it has created” could be more protective is this could protect us from any inconsistencies between original lock node and local lock state.
>>>   
>>> >  It would be nice to generate a test/simulation for this case so that it can be properly dealt with.
>>>  
>>> Created a test case simulator https://github.com/ViswaNXplore/curator/blob/curator-2.5.0/patch/double_locking_test_simulator/curator-recipes/src/test/java/org/apache/curator/framework/recipes/CuratorRecipeTest.java <https://github.com/ViswaNXplore/curator/blob/curator-2.5.0/patch/double_locking_test_simulator/curator-recipes/src/test/java/org/apache/curator/framework/recipes/CuratorRecipeTest.java>
>>>  
>>> Approach #1 (Artificial way of reproducing the issue)
>>> Run the test case with (SIMULATE_ISSUE_BY_EXPLICIT_DELETE set to true). This would delete the lock node after a pause just to simulate the state where the zkClient had reconnected and it still happens to see the ephermal node just before the server deletes it since its session has expired, but the node is deleted afterwards by the server.
>>> Approach #2 (A little manual interruption needed to reproduce the issue)
>>> Run the test case in Debug mode (SIMULATE_ISSUE_BY_EXPLICIT_DELETE set to false)
>>> Artificially delaying / pausing the ephemeral lock nodes deletion as part of session cleanup process in server code (ZookeeperServer.close() method)
>>> After a pause (say 5s) to make one of the instance to acquire the lock, Artificially break the socket connection between client and server for 30s (by keeping breakpoint in ClientCnxnSocketNIO.doIO() method). After 30s, we would see session closing logs logged in server code
>>> After 1min, remove breakpoint in ClientCnxnSocketNIO.doIO() and resume both Thread 2 and Thread 3
>>> After that, resume server thread (thread name would be “SessionTracker”
>>> Below Proposals discussed so far (3rd added now for your review)
>>> Making lock code watch its lock node it has created
>>> Session ID changes after wait exits it suggests the lock node should no longer be trusted and it should be deleted and re-created
>>> Leader Latch code is well protected to cover this zookeeper race condition, because Leader Latch code internally handle the connection events (which they use to interrupt latch_acquire_state to reset its latch every time connection is reconnected), means it will explicitly release the latch and recreate new if there is a connection disconnect (may be this can be the approach that lock recipe could use to protect ?) 
>>> Many Thanks,
>>> Viswa.
>>>  
>>> From: Jordan Zimmerman <jordan@jordanzimmerman.com <ma...@jordanzimmerman.com>>
>>> Date: Tuesday, 20 July 2021 at 21:11
>>> To: Viswanathan Rajagopal <viswanathan.rajagopa@workday.com <ma...@workday.com>>
>>> Cc: Sean Gibbons <sean.gibbons@workday.com <ma...@workday.com>>, dev@curator.apache.org <ma...@curator.apache.org> <dev@curator.apache.org <ma...@curator.apache.org>>, cammckenzie@apache.org <ma...@apache.org> <cammckenzie@apache.org <ma...@apache.org>>, user@curator.apache.org <ma...@curator.apache.org> <user@curator.apache.org <ma...@curator.apache.org>>, Donal Arundel <donal.arundel@workday.com <ma...@workday.com>>, Francisco Goncalves <francisco.goncalves@workday.com <ma...@workday.com>>, Zak Szadowski <zak.szadowski@workday.com <ma...@workday.com>>, Dandie Beloy <dandie.beloy@workday.com <ma...@workday.com>>, Marcelo Cenerino <marcelo.cenerino@workday.com <ma...@workday.com>>
>>> Subject: Re: [External Sender] Double Locking Issue
>>> 
>>> In our case, it wouldn’t throw any exception because it had gone past “creating lock nodes” and was blocked on wait(), which would only then be interrupted when curator watcher notified on previous sequence node delete event.
>>>  
>>> So, you're using the version of acquire() without a timeout? In any event, this is a problem. When you receive SUSPENDED you really should interrupt any threads that are waiting on Curator. The Curator docs imply this even though it might not be obvious. This is likely the source of your problems. A simple solution is to use the version of acquire that has a timeout and repeatedly call it until success (though your problem may still occur in this case). Maybe we could improve the lock recipe (or something similar) so that locks inside of acquire are interrupted on a network partition. 
>>>  
>>> [snip of your remaining thoughtful analysis]
>>>  
>>> I think there's generally a hole in Curator's lock recipe. The lock code does not watch the lock node it has created. So, another process or (as you found) a race with the server might cause the lock node to disappear underneath the lock instance after it thinks it has the lock. 
>>>  
>>> One thing we can do is to check the session ID before waiting in LockInternals.internalLockLoop(). If the session ID changes after wait exits it suggests the lock node should no longer be trusted and it should be deleted and re-created. That's 1 though anyway. It would be nice to generate a test/simulation for this case so that it can be properly dealt with.
>>>  
>>> -Jordan
>>> 
>>> 
>>> On Jul 20, 2021, at 3:53 PM, Viswanathan Rajagopal <viswanathan.rajagopa@workday.com <ma...@workday.com>> wrote:
>>>  
>>> Thanks Jordan for coming back on this.
>>>  
>>> Please find my inline comments. Also provided below few additional version info that we are using,
>>> Curator version – 2.5.0
>>> Zookeeper version – 3.5.3
>>>  
>>> >  I don't see how this is the case. If the client has received a network partition it shouldn't not consider any locks as being currently held (see the Tech Notes I mentioned in my previous email). If there is a partition during a call to acquire(), acquire() would throw an exception (once the retry policy has expired). BTW, what retry policy are you using?
>>>  
>>> In our case, it wouldn’t throw any exception because it had gone past “creating lock nodes” and was blocked on wait(), which would only then be interrupted when curator watcher notified on previous sequence node delete event.
>>>  
>>> > This isn't correct. Curator watches the previous node but lock acquisition is always based on calling ZK's getChildren, sorting them, and seeing if the lock's node is the first node in the sequence. If Ephemeral nodes aren't cleaned up it wouldn't be a problem. Any phantom ephemeral nodes would sort first and prevent Curator from believing it holds the lock.
>>>  
>>> Totally agreed on your point on lock acquisition. The problem that I was trying to explain is that the network partition occurs while the client is waiting for the lock (in the acquire()) but after the acquire code block has already written out the lock node successfully, so the client code will be blocking in the acquire and not get a connection exception. Once the previous lock node is deleted but not the lock node for this client then this client will leave the acquire thinking it has the lock but actually at this point the ZK server has expired the session and is in the process of deleting our clients lock node.
>>>  
>>> Trying to categorize the timeline events below to see if that would help us for better understanding,
>>> [ CONNECTED ] :
>>> Client A & Client B calls acquire creating Node N1 & N2 respectively
>>> Client A acquire() -> holds the lock as its Node N1 is first node in sequence
>>> Client B acquire() -> created Node N2, watches for previous sequence node (blocked on wait())
>>>  
>>> [ SUSPENDED CONNECTION OCCURS ] :
>>> Network partition happens
>>> Curator fires SUSPENDED
>>> Client A, on receiving SUSPENDED event, attempts to release the lock
>>> Client B, on receiving SUSPENDED event, does nothing as it was not holding any lock (as blocked on acquire() -> wait() )  
>>>  
>>> [ SESSION EXPIRED ] :
>>> Client A attempts to release the lock (with retries)
>>> Client B does nothing as it was not holding any lock (and is still blocked on acquire() -> wait() )
>>> Server prepared to cleanup previous client session and deleting lock nodes created as part of previous client session
>>>  
>>> [ RECONNECTED ] :
>>> Client A and Client B reconnected to another server with new session id (Note : This happens before server managed to cleanup / delete ephemeral nodes of previous client session)
>>> Client A released the lock successfully (means it would delete its lock node N1) and attempts to acquire lock by creating lock node N3 and watches for previous sequence node (N2)
>>> Client B who was blocked on acquire() -> wait() would be notified with previous sequence node (N1) deletion -> getChildren, sorting them, and seeing if the lock's node is the first node in the sequence. So, Client B sees its lock node N2 still (which I call it as about to be deleted node by server) and thereby acquires the lock
>>>  
>>> [ AFTER FEW SECONDS ] :
>>> Server managed to delete the ephemeral node N2 as part of previous client session cleanup
>>> Client A who was blocked on acquire() -> wait() would be notified with previous sequence node (N2) deletion -> getChildren, sorting them, and seeing if the lock's node is the first node in the sequence and thereby acquires the lock
>>> Client B – its local lock thread data went stale (as its lock path N2 not has been deleted by Server)
>>>  
>>> >  SUSPENDED in Curator means only that the connect has been lost, not that the session has ended.
>>> LOST is the state that means the session has ended
>>> Be aware of how GCs can affect Curator. See the Tech Note here: https://cwiki.apache.org/confluence/display/CURATOR/TN10 <https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CURATOR_TN10&d=DwMFaQ&c=DS6PUFBBr_KiLo7Sjt3ljp5jaW5k2i9ijVXllEdOozc&r=mwSLlPO0Vtstmu1dce0TMFqf5lUxD2SPdNdc1k4NXjU&m=jQttOc88vMYbwNBFzWUO8TKGSmu_8P6EDmcD7jlrVys&s=BJywW80sEBAxOQEuTDkeP8tx0XX0s01sz44lNyWXIXs&e=>
>>> <https://cwiki.apache.org/confluence/display/CURATOR/TN10 <https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CURATOR_TN10&d=DwMFaQ&c=DS6PUFBBr_KiLo7Sjt3ljp5jaW5k2i9ijVXllEdOozc&r=mwSLlPO0Vtstmu1dce0TMFqf5lUxD2SPdNdc1k4NXjU&m=jQttOc88vMYbwNBFzWUO8TKGSmu_8P6EDmcD7jlrVys&s=BJywW80sEBAxOQEuTDkeP8tx0XX0s01sz44lNyWXIXs&e=>>
>>> Also read this Tech Note on session handling: https://cwiki.apache.org/confluence/display/CURATOR/TN14 <https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CURATOR_TN14&d=DwMFaQ&c=DS6PUFBBr_KiLo7Sjt3ljp5jaW5k2i9ijVXllEdOozc&r=mwSLlPO0Vtstmu1dce0TMFqf5lUxD2SPdNdc1k4NXjU&m=jQttOc88vMYbwNBFzWUO8TKGSmu_8P6EDmcD7jlrVys&s=UhQP5TNa_OMaYVSJX3b9p7XqWZ0fZtErLQwwbcQUUQA&e=>
>>> <https://cwiki.apache.org/confluence/display/CURATOR/TN14 <https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CURATOR_TN14&d=DwMFaQ&c=DS6PUFBBr_KiLo7Sjt3ljp5jaW5k2i9ijVXllEdOozc&r=mwSLlPO0Vtstmu1dce0TMFqf5lUxD2SPdNdc1k4NXjU&m=jQttOc88vMYbwNBFzWUO8TKGSmu_8P6EDmcD7jlrVys&s=UhQP5TNa_OMaYVSJX3b9p7XqWZ0fZtErLQwwbcQUUQA&e=>>
>>>  
>>> Very good information on the tech notes. Thanks for that. Agreed, it’s always recommended to clear the locks when it sees SUSPENDED event. And we are already following your recommendation. Our application would clear the lock when it sees SUSPENDED event.
>>>  
>>> To summarize my thoughts,
>>> Ephemeral node created by previous session was  still seen by client that reconnected with new session id until server cleans that up the previous session ephemeral node. This could happen if client manage to reconnect with server with new session id before server cleans up the previous session. How it affects the curator lock recipe ? Explanation : The above explained race condition would make acquire() to hold the lock ( as its own lock node still seen ), eventually leading to inconsistent state (i.e. curator local lock state stale) when that lock node is being cleaned up by the server as part of previous session cleanup activities.
>>> I am NOT seeing a Curator bug here, but looking out for suggestions / recommendations in handling this zookeeper race condition. Either can a feature be added to Curator to cover this case / any recommendations that clients should follow.
>>>  
>>> Many Thanks,
>>> Viswa.
>>>  
>>> From: Sean Gibbons <sean.gibbons@workday.com <ma...@workday.com>>
>>> Date: Tuesday, 20 July 2021 at 12:19
>>> To: Viswanathan Rajagopal <viswanathan.rajagopa@workday.com <ma...@workday.com>>
>>> Subject: FW: [External Sender] Re: Double Locking Issue
>>> 
>>> 
>>> 
>>> On 20/07/2021, 10:12, "Jordan Zimmerman" <jordan@jordanzimmerman.com <ma...@jordanzimmerman.com>> wrote:
>>> 
>>>     A few more things...
>>> 
>>>     > Based on our initial analysis and few test runs, we saw that Curator acquire() method acquires the lock based on “about to be deleted lock node of previous session”. Explanation : Ephemeral node created by previous session was  still seen by client that reconnected with new session id until server cleans that up. If this happens, Curator acquire() would hold the lock.
>>> 
>>>     This isn't correct. Curator watches the previous node but lock acquisition is always based on calling ZK's getChildren, sorting them, and seeing if the lock's node is the first node in the sequence. If Ephemeral nodes aren't cleaned up it wouldn't be a problem. Any phantom ephemeral nodes would sort first and prevent Curator from believing it holds the lock.
>>> 
>>>     >  *   On the above mentioned race condition, if client manage to reconnect to server with new session id before server cleans up the ephemeral nodes of client’s previous session,  Curator lock acquire() who is trying to acquire the lock will hold the lock as it still sees the lock node in zookeeper directory. Eventually server would be cleaning up the ephemeral nodes leaving the Curator local lock thread data stale giving the illusion that it still hold the lock while its ephemeral node is gone
>>> 
>>>     I don't see how this is the case. If the client has received a network partition it shouldn't not consider any locks as being currently held (see the Tech Notes I mentioned in my previous email). If there is a partition during a call to acquire(), acquire() would throw an exception (once the retry policy has expired). BTW, what retry policy are you using?
>>> 
>>>     So, to reiterate, I don't see how phantom/undeleted ephemeral nodes would cause a problem. The only problem it could case is that a given Curator client takes longer to acquire a lock as it waits for those ephemerals to finally get deleted.
>>> 
>>>     -JZ
>>> 
>>>     > On Jul 19, 2021, at 6:45 PM, Viswanathan Rajagopal <viswanathan.rajagopa@workday.com.INVALID <ma...@workday.com.INVALID>> wrote:
>>>     > 
>>>     > Hi Team,
>>>     > 
>>>     > Good day.
>>>     > 
>>>     > Recently came across “Double Locking Issue (i.e. two clients acquiring lock)” using Curator code ( InterProcessMutex lock APIs ) in our application
>>>     > 
>>>     > Our use case:
>>>     > 
>>>     >  *   Two clients attempts to acquire the zookeeper lock using Curator InterProcessMutex and whoever owns it would release it once sees the connection disconnect ( on receiving Connection.SUSPENDED / Connection.LOST Curator Connection Events from Connection listener)
>>>     > 
>>>     > Issue we noticed:
>>>     > 
>>>     >  *   After session expired & reconnected with new session, both client seems to have acquired the lock. Interesting thing that we found is that one of the clients still holds the lock while its lock node (ephemeral) was gone
>>>     > 
>>>     > Things we found:
>>>     > 
>>>     >  *   Based on our initial analysis and few test runs, we saw that Curator acquire() method acquires the lock based on “about to be deleted lock node of previous session”. Explanation : Ephemeral node created by previous session was  still seen by client that reconnected with new session id until server cleans that up. If this happens, Curator acquire() would hold the lock.
>>>     > 
>>>     > 
>>>     > 
>>>     >  *   Clearly we could see the race condition (in zookeeper code) between 1). Client reconnecting to server with new session id and 2). server deleting the ephemeral nodes of client’s previous session. We were able to reproduce this issue using the following approach,
>>>     >     *   Artificially break the socket connection between client and server for 30s
>>>     >     *   Artificially pausing the set of server codes for a min and unpause
>>>     > 
>>>     > 
>>>     >  *   On the above mentioned race condition, if client manage to reconnect to server with new session id before server cleans up the ephemeral nodes of client’s previous session,  Curator lock acquire() who is trying to acquire the lock will hold the lock as it still sees the lock node in zookeeper directory. Eventually server would be cleaning up the ephemeral nodes leaving the Curator local lock thread data stale giving the illusion that it still hold the lock while its ephemeral node is gone
>>>     > 
>>>     > 
>>>     >  *   Timeline events described below for better understanding,
>>>     >     *   At t1, Client A and Client B establishes zookeeper session with session id A1 and B1 respectively
>>>     >     *   At t2, Client A creates the lock node N1 & acquires the lock
>>>     >     *   At t3, Client B creates the lock node N2 & blocked on acquire() to acquire the lock
>>>     >     *   At t4, session timed out for both clients & server is about to clean up the old session • Client A trying to release the lock
>>>     >     *   At t5, Client A and Client B reconnects to server with new session id A2 and B2 respectively before server deletes the ephemeral node N1 & N2 of previous client session. Client A releases the lock, deleting N1 and trying to acquire it again by creating N3 node and Client B who is blocked on acquire() acquires the lock based on N2 (about to be deleted node created by previous session)
>>>     >     *   At  t6, server cleans up the ephemeral node N2 created by Client B’s previous session. Client A acquires the lock with node N3 as its previous sequence N2 gets deleted, whereas Client B who incorrectly acquired the lock at t5 timeline still holds the lock
>>>     > 
>>>     > 
>>>     > Note:
>>>     > 
>>>     >  *   We are not certain that this race condition that we noticed in zookeeper code is intentional design.
>>>     > 
>>>     > Questions:
>>>     > 
>>>     >  *   Given this race condition seen in zookeeper code, we would like to hear your recommendations / suggestions to avoid this issue while using Curator lock code?
>>>     > 
>>>     > 
>>>     > 
>>>     >  *   We also see that Interprocess Mutex has the makeRevocable() API that enables application to revoke lock, but that handles Node change event only but not for Node deleted event. I understand that this makes sense to handle Node change event alone, as to enable application user to revoke lock externally from application side. But would it be also to okay to have one for Node delete event, so as application can register the listener for Node delete event. I would like to hear your thoughts.
>>>     > 
>>>     > 
>>>     > Many Thanks,
>>>     > Viswa
>>> 
>> 
>

Re: [External Sender] Double Locking Issue

Posted by Jordan Zimmerman <jo...@jordanzimmerman.com>.

Even more... actually, this should already work right? It's been a long time since I looked at this code so I wrote a quick test. wait() will already exits on Connection loss. I apologize, but what I said below is not correct. The LockInternals watcher gets called on connection loss and interrupts the thread already via notifyAll(). So, again, sorry for the noise.

I'll go over the scenario again to see if I have more comments.

-Jordan

> On Jul 30, 2021, at 9:18 AM, Jordan Zimmerman <jo...@jordanzimmerman.com> wrote:
> 
> Actually... I'm looking at LockInternals and it already has a watcher set on the previous node before it calls wait(). That watcher will get called when there's a connection problem. It would be pretty easy to add something in that watcher to interrupt the waiting thread. This could be a pretty simple change actually.
> 
> Watcher set just before waiting: https://github.com/apache/curator/blob/master/curator-recipes/src/main/java/org/apache/curator/framework/recipes/locks/LockInternals.java#L301 <https://github.com/apache/curator/blob/master/curator-recipes/src/main/java/org/apache/curator/framework/recipes/locks/LockInternals.java#L301>
> The Watcher: https://github.com/apache/curator/blob/master/curator-recipes/src/main/java/org/apache/curator/framework/recipes/locks/LockInternals.java#L64 <https://github.com/apache/curator/blob/master/curator-recipes/src/main/java/org/apache/curator/framework/recipes/locks/LockInternals.java#L64>
> 
> So, off the top of my head, an AtomicReference<Thread> field could be added to LockInternals. It gets set to the current thread just before waiting. If the Watcher is called with a connection event and the AtomicReference isn't empty, interrupt the thread. This will cause wait() to throw InterruptedException and then the lock node will be deleted.
> 
> -Jordan
> 
>> On Jul 30, 2021, at 9:11 AM, Jordan Zimmerman <jordan@jordanzimmerman.com <ma...@jordanzimmerman.com>> wrote:
>> 
>> Making lock code watch its lock node it has created
>> 
>> I don't think this is useful. Once the lock method returns the lock code can't notify the caller/user that the lock node has disappeared. We'd need some kind of additional class "lock watcher" or something that client code would have to periodically call to ensure that it still has the lock. tbh - this has been discussed a lot over the years on the ZooKeeper/Curator channels. Just because you have a ZNode doesn't mean it's 100% valid. You should still practice optimistic locks, etc. But, maybe we can reduce the edge cases with this lock watcher or some other mechanism.
>> 
>> Session ID changes after wait exits it suggests the lock node should no longer be trusted and it should be deleted and re-created
>> 
>> This is a simple change. But, I don't know if it would help much. It would be better to interrupt waiting lockers when the connection is lost. If an internal mechanism forced any locks blocked in wait() to throw an exception then those lock threads will already delete their ZNodes and try again. This would be the best solution maybe? So, before the lock code goes to wait() it sets a ConnectionStateListener (or something similar) that will interrupt the thread when there are connection problems.
>> 
>> Leader Latch code is well protected
>> 
>> Yes - the leader recipes are better at this. Maybe there's an opportunity to merge/change code. We could consider deprecating the lock recipes in favor of these?
>> 
>> -Jordan
>> 
>>> On Jul 29, 2021, at 2:35 AM, Viswanathan Rajagopal <viswanathan.rajagopa@workday.com <ma...@workday.com>> wrote:
>>> 
>>> Hi Jordan,
>>>  
>>> Thanks for the suggestions. Much helpful.
>>>  
>>> Quick summary of my below comments,
>>> Added test case simulator to reproduce issue (though artificially)
>>> Added a proposed code just to get your review and suggestions to see whether that would work
>>> Please find my detailed comments inline,
>>>  
>>> > I think there's generally a hole in Curator's lock recipe. The lock code does not watch the lock node it has created. So, another process or (as you found) a race with the server might cause the lock node to disappear underneath the lock instance after it thinks it has the lock. One thing we can do is to check the session ID before waiting in LockInternals.internalLockLoop(). If the session ID changes after wait exits it suggests the lock node should no longer be trusted and it should be deleted and re-created. That's 1 though anyway.
>>>  
>>> Yes, you are right. Currently lock code doesn’t watch the lock node it has created. That’s where we would like to hear your feedback on our proposal mentioned in our first mail thread 
>>> “Curator lock code has makeRevocable() API that enables application to revoke lock anytime from application by triggering NODE_CHANGE event through Revoker.attemptToRevoke() utility. 
>>> Proposal: Would it be nice to extend makeRevocable() API to handle Node delete event, which would allow application to register the watcher for Node delete event, thereby application can react to Node delete event by revoking the lock ?”. Tried the Code snippet (https://github.com/ViswaNXplore/curator/blob/curator-2.5.0/patch/double_locking_test_simulator/curator-recipes/src/main/java/org/apache/curator/framework/recipes/locks/LockInternals.java#L60 <https://github.com/ViswaNXplore/curator/blob/curator-2.5.0/patch/double_locking_test_simulator/curator-recipes/src/main/java/org/apache/curator/framework/recipes/locks/LockInternals.java#L60>
>>> ) to see the approach “Making lock code watch its lock node it has created” works and would like to get your thoughts on this and do you see any other impacts there? May need your advice.
>>>  
>>> I agree with you on that the session id approach, that would protect us from seeing this issue. The reason why I thought your first approach “making lock code watch it has created” could be more protective is this could protect us from any inconsistencies between original lock node and local lock state.
>>>   
>>> >  It would be nice to generate a test/simulation for this case so that it can be properly dealt with.
>>>  
>>> Created a test case simulator https://github.com/ViswaNXplore/curator/blob/curator-2.5.0/patch/double_locking_test_simulator/curator-recipes/src/test/java/org/apache/curator/framework/recipes/CuratorRecipeTest.java <https://github.com/ViswaNXplore/curator/blob/curator-2.5.0/patch/double_locking_test_simulator/curator-recipes/src/test/java/org/apache/curator/framework/recipes/CuratorRecipeTest.java>
>>>  
>>> Approach #1 (Artificial way of reproducing the issue)
>>> Run the test case with (SIMULATE_ISSUE_BY_EXPLICIT_DELETE set to true). This would delete the lock node after a pause just to simulate the state where the zkClient had reconnected and it still happens to see the ephermal node just before the server deletes it since its session has expired, but the node is deleted afterwards by the server.
>>> Approach #2 (A little manual interruption needed to reproduce the issue)
>>> Run the test case in Debug mode (SIMULATE_ISSUE_BY_EXPLICIT_DELETE set to false)
>>> Artificially delaying / pausing the ephemeral lock nodes deletion as part of session cleanup process in server code (ZookeeperServer.close() method)
>>> After a pause (say 5s) to make one of the instance to acquire the lock, Artificially break the socket connection between client and server for 30s (by keeping breakpoint in ClientCnxnSocketNIO.doIO() method). After 30s, we would see session closing logs logged in server code
>>> After 1min, remove breakpoint in ClientCnxnSocketNIO.doIO() and resume both Thread 2 and Thread 3
>>> After that, resume server thread (thread name would be “SessionTracker”
>>> Below Proposals discussed so far (3rd added now for your review)
>>> Making lock code watch its lock node it has created
>>> Session ID changes after wait exits it suggests the lock node should no longer be trusted and it should be deleted and re-created
>>> Leader Latch code is well protected to cover this zookeeper race condition, because Leader Latch code internally handle the connection events (which they use to interrupt latch_acquire_state to reset its latch every time connection is reconnected), means it will explicitly release the latch and recreate new if there is a connection disconnect (may be this can be the approach that lock recipe could use to protect ?) 
>>> Many Thanks,
>>> Viswa.
>>>  
>>> From: Jordan Zimmerman <jordan@jordanzimmerman.com <ma...@jordanzimmerman.com>>
>>> Date: Tuesday, 20 July 2021 at 21:11
>>> To: Viswanathan Rajagopal <viswanathan.rajagopa@workday.com <ma...@workday.com>>
>>> Cc: Sean Gibbons <sean.gibbons@workday.com <ma...@workday.com>>, dev@curator.apache.org <ma...@curator.apache.org> <dev@curator.apache.org <ma...@curator.apache.org>>, cammckenzie@apache.org <ma...@apache.org> <cammckenzie@apache.org <ma...@apache.org>>, user@curator.apache.org <ma...@curator.apache.org> <user@curator.apache.org <ma...@curator.apache.org>>, Donal Arundel <donal.arundel@workday.com <ma...@workday.com>>, Francisco Goncalves <francisco.goncalves@workday.com <ma...@workday.com>>, Zak Szadowski <zak.szadowski@workday.com <ma...@workday.com>>, Dandie Beloy <dandie.beloy@workday.com <ma...@workday.com>>, Marcelo Cenerino <marcelo.cenerino@workday.com <ma...@workday.com>>
>>> Subject: Re: [External Sender] Double Locking Issue
>>> 
>>> In our case, it wouldn’t throw any exception because it had gone past “creating lock nodes” and was blocked on wait(), which would only then be interrupted when curator watcher notified on previous sequence node delete event.
>>>  
>>> So, you're using the version of acquire() without a timeout? In any event, this is a problem. When you receive SUSPENDED you really should interrupt any threads that are waiting on Curator. The Curator docs imply this even though it might not be obvious. This is likely the source of your problems. A simple solution is to use the version of acquire that has a timeout and repeatedly call it until success (though your problem may still occur in this case). Maybe we could improve the lock recipe (or something similar) so that locks inside of acquire are interrupted on a network partition. 
>>>  
>>> [snip of your remaining thoughtful analysis]
>>>  
>>> I think there's generally a hole in Curator's lock recipe. The lock code does not watch the lock node it has created. So, another process or (as you found) a race with the server might cause the lock node to disappear underneath the lock instance after it thinks it has the lock. 
>>>  
>>> One thing we can do is to check the session ID before waiting in LockInternals.internalLockLoop(). If the session ID changes after wait exits it suggests the lock node should no longer be trusted and it should be deleted and re-created. That's 1 though anyway. It would be nice to generate a test/simulation for this case so that it can be properly dealt with.
>>>  
>>> -Jordan
>>> 
>>> 
>>> On Jul 20, 2021, at 3:53 PM, Viswanathan Rajagopal <viswanathan.rajagopa@workday.com <ma...@workday.com>> wrote:
>>>  
>>> Thanks Jordan for coming back on this.
>>>  
>>> Please find my inline comments. Also provided below few additional version info that we are using,
>>> Curator version – 2.5.0
>>> Zookeeper version – 3.5.3
>>>  
>>> >  I don't see how this is the case. If the client has received a network partition it shouldn't not consider any locks as being currently held (see the Tech Notes I mentioned in my previous email). If there is a partition during a call to acquire(), acquire() would throw an exception (once the retry policy has expired). BTW, what retry policy are you using?
>>>  
>>> In our case, it wouldn’t throw any exception because it had gone past “creating lock nodes” and was blocked on wait(), which would only then be interrupted when curator watcher notified on previous sequence node delete event.
>>>  
>>> > This isn't correct. Curator watches the previous node but lock acquisition is always based on calling ZK's getChildren, sorting them, and seeing if the lock's node is the first node in the sequence. If Ephemeral nodes aren't cleaned up it wouldn't be a problem. Any phantom ephemeral nodes would sort first and prevent Curator from believing it holds the lock.
>>>  
>>> Totally agreed on your point on lock acquisition. The problem that I was trying to explain is that the network partition occurs while the client is waiting for the lock (in the acquire()) but after the acquire code block has already written out the lock node successfully, so the client code will be blocking in the acquire and not get a connection exception. Once the previous lock node is deleted but not the lock node for this client then this client will leave the acquire thinking it has the lock but actually at this point the ZK server has expired the session and is in the process of deleting our clients lock node.
>>>  
>>> Trying to categorize the timeline events below to see if that would help us for better understanding,
>>> [ CONNECTED ] :
>>> Client A & Client B calls acquire creating Node N1 & N2 respectively
>>> Client A acquire() -> holds the lock as its Node N1 is first node in sequence
>>> Client B acquire() -> created Node N2, watches for previous sequence node (blocked on wait())
>>>  
>>> [ SUSPENDED CONNECTION OCCURS ] :
>>> Network partition happens
>>> Curator fires SUSPENDED
>>> Client A, on receiving SUSPENDED event, attempts to release the lock
>>> Client B, on receiving SUSPENDED event, does nothing as it was not holding any lock (as blocked on acquire() -> wait() )  
>>>  
>>> [ SESSION EXPIRED ] :
>>> Client A attempts to release the lock (with retries)
>>> Client B does nothing as it was not holding any lock (and is still blocked on acquire() -> wait() )
>>> Server prepared to cleanup previous client session and deleting lock nodes created as part of previous client session
>>>  
>>> [ RECONNECTED ] :
>>> Client A and Client B reconnected to another server with new session id (Note : This happens before server managed to cleanup / delete ephemeral nodes of previous client session)
>>> Client A released the lock successfully (means it would delete its lock node N1) and attempts to acquire lock by creating lock node N3 and watches for previous sequence node (N2)
>>> Client B who was blocked on acquire() -> wait() would be notified with previous sequence node (N1) deletion -> getChildren, sorting them, and seeing if the lock's node is the first node in the sequence. So, Client B sees its lock node N2 still (which I call it as about to be deleted node by server) and thereby acquires the lock
>>>  
>>> [ AFTER FEW SECONDS ] :
>>> Server managed to delete the ephemeral node N2 as part of previous client session cleanup
>>> Client A who was blocked on acquire() -> wait() would be notified with previous sequence node (N2) deletion -> getChildren, sorting them, and seeing if the lock's node is the first node in the sequence and thereby acquires the lock
>>> Client B – its local lock thread data went stale (as its lock path N2 not has been deleted by Server)
>>>  
>>> >  SUSPENDED in Curator means only that the connect has been lost, not that the session has ended.
>>> LOST is the state that means the session has ended
>>> Be aware of how GCs can affect Curator. See the Tech Note here: https://cwiki.apache.org/confluence/display/CURATOR/TN10 <https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CURATOR_TN10&d=DwMFaQ&c=DS6PUFBBr_KiLo7Sjt3ljp5jaW5k2i9ijVXllEdOozc&r=mwSLlPO0Vtstmu1dce0TMFqf5lUxD2SPdNdc1k4NXjU&m=jQttOc88vMYbwNBFzWUO8TKGSmu_8P6EDmcD7jlrVys&s=BJywW80sEBAxOQEuTDkeP8tx0XX0s01sz44lNyWXIXs&e=>
>>> <https://cwiki.apache.org/confluence/display/CURATOR/TN10 <https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CURATOR_TN10&d=DwMFaQ&c=DS6PUFBBr_KiLo7Sjt3ljp5jaW5k2i9ijVXllEdOozc&r=mwSLlPO0Vtstmu1dce0TMFqf5lUxD2SPdNdc1k4NXjU&m=jQttOc88vMYbwNBFzWUO8TKGSmu_8P6EDmcD7jlrVys&s=BJywW80sEBAxOQEuTDkeP8tx0XX0s01sz44lNyWXIXs&e=>>
>>> Also read this Tech Note on session handling: https://cwiki.apache.org/confluence/display/CURATOR/TN14 <https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CURATOR_TN14&d=DwMFaQ&c=DS6PUFBBr_KiLo7Sjt3ljp5jaW5k2i9ijVXllEdOozc&r=mwSLlPO0Vtstmu1dce0TMFqf5lUxD2SPdNdc1k4NXjU&m=jQttOc88vMYbwNBFzWUO8TKGSmu_8P6EDmcD7jlrVys&s=UhQP5TNa_OMaYVSJX3b9p7XqWZ0fZtErLQwwbcQUUQA&e=>
>>> <https://cwiki.apache.org/confluence/display/CURATOR/TN14 <https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CURATOR_TN14&d=DwMFaQ&c=DS6PUFBBr_KiLo7Sjt3ljp5jaW5k2i9ijVXllEdOozc&r=mwSLlPO0Vtstmu1dce0TMFqf5lUxD2SPdNdc1k4NXjU&m=jQttOc88vMYbwNBFzWUO8TKGSmu_8P6EDmcD7jlrVys&s=UhQP5TNa_OMaYVSJX3b9p7XqWZ0fZtErLQwwbcQUUQA&e=>>
>>>  
>>> Very good information on the tech notes. Thanks for that. Agreed, it’s always recommended to clear the locks when it sees SUSPENDED event. And we are already following your recommendation. Our application would clear the lock when it sees SUSPENDED event.
>>>  
>>> To summarize my thoughts,
>>> Ephemeral node created by previous session was  still seen by client that reconnected with new session id until server cleans that up the previous session ephemeral node. This could happen if client manage to reconnect with server with new session id before server cleans up the previous session. How it affects the curator lock recipe ? Explanation : The above explained race condition would make acquire() to hold the lock ( as its own lock node still seen ), eventually leading to inconsistent state (i.e. curator local lock state stale) when that lock node is being cleaned up by the server as part of previous session cleanup activities.
>>> I am NOT seeing a Curator bug here, but looking out for suggestions / recommendations in handling this zookeeper race condition. Either can a feature be added to Curator to cover this case / any recommendations that clients should follow.
>>>  
>>> Many Thanks,
>>> Viswa.
>>>  
>>> From: Sean Gibbons <sean.gibbons@workday.com <ma...@workday.com>>
>>> Date: Tuesday, 20 July 2021 at 12:19
>>> To: Viswanathan Rajagopal <viswanathan.rajagopa@workday.com <ma...@workday.com>>
>>> Subject: FW: [External Sender] Re: Double Locking Issue
>>> 
>>> 
>>> 
>>> On 20/07/2021, 10:12, "Jordan Zimmerman" <jordan@jordanzimmerman.com <ma...@jordanzimmerman.com>> wrote:
>>> 
>>>     A few more things...
>>> 
>>>     > Based on our initial analysis and few test runs, we saw that Curator acquire() method acquires the lock based on “about to be deleted lock node of previous session”. Explanation : Ephemeral node created by previous session was  still seen by client that reconnected with new session id until server cleans that up. If this happens, Curator acquire() would hold the lock.
>>> 
>>>     This isn't correct. Curator watches the previous node but lock acquisition is always based on calling ZK's getChildren, sorting them, and seeing if the lock's node is the first node in the sequence. If Ephemeral nodes aren't cleaned up it wouldn't be a problem. Any phantom ephemeral nodes would sort first and prevent Curator from believing it holds the lock.
>>> 
>>>     >  *   On the above mentioned race condition, if client manage to reconnect to server with new session id before server cleans up the ephemeral nodes of client’s previous session,  Curator lock acquire() who is trying to acquire the lock will hold the lock as it still sees the lock node in zookeeper directory. Eventually server would be cleaning up the ephemeral nodes leaving the Curator local lock thread data stale giving the illusion that it still hold the lock while its ephemeral node is gone
>>> 
>>>     I don't see how this is the case. If the client has received a network partition it shouldn't not consider any locks as being currently held (see the Tech Notes I mentioned in my previous email). If there is a partition during a call to acquire(), acquire() would throw an exception (once the retry policy has expired). BTW, what retry policy are you using?
>>> 
>>>     So, to reiterate, I don't see how phantom/undeleted ephemeral nodes would cause a problem. The only problem it could case is that a given Curator client takes longer to acquire a lock as it waits for those ephemerals to finally get deleted.
>>> 
>>>     -JZ
>>> 
>>>     > On Jul 19, 2021, at 6:45 PM, Viswanathan Rajagopal <viswanathan.rajagopa@workday.com.INVALID <ma...@workday.com.INVALID>> wrote:
>>>     > 
>>>     > Hi Team,
>>>     > 
>>>     > Good day.
>>>     > 
>>>     > Recently came across “Double Locking Issue (i.e. two clients acquiring lock)” using Curator code ( InterProcessMutex lock APIs ) in our application
>>>     > 
>>>     > Our use case:
>>>     > 
>>>     >  *   Two clients attempts to acquire the zookeeper lock using Curator InterProcessMutex and whoever owns it would release it once sees the connection disconnect ( on receiving Connection.SUSPENDED / Connection.LOST Curator Connection Events from Connection listener)
>>>     > 
>>>     > Issue we noticed:
>>>     > 
>>>     >  *   After session expired & reconnected with new session, both client seems to have acquired the lock. Interesting thing that we found is that one of the clients still holds the lock while its lock node (ephemeral) was gone
>>>     > 
>>>     > Things we found:
>>>     > 
>>>     >  *   Based on our initial analysis and few test runs, we saw that Curator acquire() method acquires the lock based on “about to be deleted lock node of previous session”. Explanation : Ephemeral node created by previous session was  still seen by client that reconnected with new session id until server cleans that up. If this happens, Curator acquire() would hold the lock.
>>>     > 
>>>     > 
>>>     > 
>>>     >  *   Clearly we could see the race condition (in zookeeper code) between 1). Client reconnecting to server with new session id and 2). server deleting the ephemeral nodes of client’s previous session. We were able to reproduce this issue using the following approach,
>>>     >     *   Artificially break the socket connection between client and server for 30s
>>>     >     *   Artificially pausing the set of server codes for a min and unpause
>>>     > 
>>>     > 
>>>     >  *   On the above mentioned race condition, if client manage to reconnect to server with new session id before server cleans up the ephemeral nodes of client’s previous session,  Curator lock acquire() who is trying to acquire the lock will hold the lock as it still sees the lock node in zookeeper directory. Eventually server would be cleaning up the ephemeral nodes leaving the Curator local lock thread data stale giving the illusion that it still hold the lock while its ephemeral node is gone
>>>     > 
>>>     > 
>>>     >  *   Timeline events described below for better understanding,
>>>     >     *   At t1, Client A and Client B establishes zookeeper session with session id A1 and B1 respectively
>>>     >     *   At t2, Client A creates the lock node N1 & acquires the lock
>>>     >     *   At t3, Client B creates the lock node N2 & blocked on acquire() to acquire the lock
>>>     >     *   At t4, session timed out for both clients & server is about to clean up the old session • Client A trying to release the lock
>>>     >     *   At t5, Client A and Client B reconnects to server with new session id A2 and B2 respectively before server deletes the ephemeral node N1 & N2 of previous client session. Client A releases the lock, deleting N1 and trying to acquire it again by creating N3 node and Client B who is blocked on acquire() acquires the lock based on N2 (about to be deleted node created by previous session)
>>>     >     *   At  t6, server cleans up the ephemeral node N2 created by Client B’s previous session. Client A acquires the lock with node N3 as its previous sequence N2 gets deleted, whereas Client B who incorrectly acquired the lock at t5 timeline still holds the lock
>>>     > 
>>>     > 
>>>     > Note:
>>>     > 
>>>     >  *   We are not certain that this race condition that we noticed in zookeeper code is intentional design.
>>>     > 
>>>     > Questions:
>>>     > 
>>>     >  *   Given this race condition seen in zookeeper code, we would like to hear your recommendations / suggestions to avoid this issue while using Curator lock code?
>>>     > 
>>>     > 
>>>     > 
>>>     >  *   We also see that Interprocess Mutex has the makeRevocable() API that enables application to revoke lock, but that handles Node change event only but not for Node deleted event. I understand that this makes sense to handle Node change event alone, as to enable application user to revoke lock externally from application side. But would it be also to okay to have one for Node delete event, so as application can register the listener for Node delete event. I would like to hear your thoughts.
>>>     > 
>>>     > 
>>>     > Many Thanks,
>>>     > Viswa
>>> 
>> 
>

Re: [External Sender] Double Locking Issue

Posted by Jordan Zimmerman <jo...@jordanzimmerman.com>.

Actually... I'm looking at LockInternals and it already has a watcher set on the previous node before it calls wait(). That watcher will get called when there's a connection problem. It would be pretty easy to add something in that watcher to interrupt the waiting thread. This could be a pretty simple change actually.

Watcher set just before waiting: https://github.com/apache/curator/blob/master/curator-recipes/src/main/java/org/apache/curator/framework/recipes/locks/LockInternals.java#L301 <https://github.com/apache/curator/blob/master/curator-recipes/src/main/java/org/apache/curator/framework/recipes/locks/LockInternals.java#L301>
The Watcher: https://github.com/apache/curator/blob/master/curator-recipes/src/main/java/org/apache/curator/framework/recipes/locks/LockInternals.java#L64 <https://github.com/apache/curator/blob/master/curator-recipes/src/main/java/org/apache/curator/framework/recipes/locks/LockInternals.java#L64>

So, off the top of my head, an AtomicReference<Thread> field could be added to LockInternals. It gets set to the current thread just before waiting. If the Watcher is called with a connection event and the AtomicReference isn't empty, interrupt the thread. This will cause wait() to throw InterruptedException and then the lock node will be deleted.

-Jordan

> On Jul 30, 2021, at 9:11 AM, Jordan Zimmerman <jo...@jordanzimmerman.com> wrote:
> 
> Making lock code watch its lock node it has created
> 
> I don't think this is useful. Once the lock method returns the lock code can't notify the caller/user that the lock node has disappeared. We'd need some kind of additional class "lock watcher" or something that client code would have to periodically call to ensure that it still has the lock. tbh - this has been discussed a lot over the years on the ZooKeeper/Curator channels. Just because you have a ZNode doesn't mean it's 100% valid. You should still practice optimistic locks, etc. But, maybe we can reduce the edge cases with this lock watcher or some other mechanism.
> 
> Session ID changes after wait exits it suggests the lock node should no longer be trusted and it should be deleted and re-created
> 
> This is a simple change. But, I don't know if it would help much. It would be better to interrupt waiting lockers when the connection is lost. If an internal mechanism forced any locks blocked in wait() to throw an exception then those lock threads will already delete their ZNodes and try again. This would be the best solution maybe? So, before the lock code goes to wait() it sets a ConnectionStateListener (or something similar) that will interrupt the thread when there are connection problems.
> 
> Leader Latch code is well protected
> 
> Yes - the leader recipes are better at this. Maybe there's an opportunity to merge/change code. We could consider deprecating the lock recipes in favor of these?
> 
> -Jordan
> 
>> On Jul 29, 2021, at 2:35 AM, Viswanathan Rajagopal <viswanathan.rajagopa@workday.com <ma...@workday.com>> wrote:
>> 
>> Hi Jordan,
>>  
>> Thanks for the suggestions. Much helpful.
>>  
>> Quick summary of my below comments,
>> Added test case simulator to reproduce issue (though artificially)
>> Added a proposed code just to get your review and suggestions to see whether that would work
>> Please find my detailed comments inline,
>>  
>> > I think there's generally a hole in Curator's lock recipe. The lock code does not watch the lock node it has created. So, another process or (as you found) a race with the server might cause the lock node to disappear underneath the lock instance after it thinks it has the lock. One thing we can do is to check the session ID before waiting in LockInternals.internalLockLoop(). If the session ID changes after wait exits it suggests the lock node should no longer be trusted and it should be deleted and re-created. That's 1 though anyway.
>>  
>> Yes, you are right. Currently lock code doesn’t watch the lock node it has created. That’s where we would like to hear your feedback on our proposal mentioned in our first mail thread 
>> “Curator lock code has makeRevocable() API that enables application to revoke lock anytime from application by triggering NODE_CHANGE event through Revoker.attemptToRevoke() utility. 
>> Proposal: Would it be nice to extend makeRevocable() API to handle Node delete event, which would allow application to register the watcher for Node delete event, thereby application can react to Node delete event by revoking the lock ?”. Tried the Code snippet (https://github.com/ViswaNXplore/curator/blob/curator-2.5.0/patch/double_locking_test_simulator/curator-recipes/src/main/java/org/apache/curator/framework/recipes/locks/LockInternals.java#L60 <https://github.com/ViswaNXplore/curator/blob/curator-2.5.0/patch/double_locking_test_simulator/curator-recipes/src/main/java/org/apache/curator/framework/recipes/locks/LockInternals.java#L60>
>> ) to see the approach “Making lock code watch its lock node it has created” works and would like to get your thoughts on this and do you see any other impacts there? May need your advice.
>>  
>> I agree with you on that the session id approach, that would protect us from seeing this issue. The reason why I thought your first approach “making lock code watch it has created” could be more protective is this could protect us from any inconsistencies between original lock node and local lock state.
>>   
>> >  It would be nice to generate a test/simulation for this case so that it can be properly dealt with.
>>  
>> Created a test case simulator https://github.com/ViswaNXplore/curator/blob/curator-2.5.0/patch/double_locking_test_simulator/curator-recipes/src/test/java/org/apache/curator/framework/recipes/CuratorRecipeTest.java <https://github.com/ViswaNXplore/curator/blob/curator-2.5.0/patch/double_locking_test_simulator/curator-recipes/src/test/java/org/apache/curator/framework/recipes/CuratorRecipeTest.java>
>>  
>> Approach #1 (Artificial way of reproducing the issue)
>> Run the test case with (SIMULATE_ISSUE_BY_EXPLICIT_DELETE set to true). This would delete the lock node after a pause just to simulate the state where the zkClient had reconnected and it still happens to see the ephermal node just before the server deletes it since its session has expired, but the node is deleted afterwards by the server.
>> Approach #2 (A little manual interruption needed to reproduce the issue)
>> Run the test case in Debug mode (SIMULATE_ISSUE_BY_EXPLICIT_DELETE set to false)
>> Artificially delaying / pausing the ephemeral lock nodes deletion as part of session cleanup process in server code (ZookeeperServer.close() method)
>> After a pause (say 5s) to make one of the instance to acquire the lock, Artificially break the socket connection between client and server for 30s (by keeping breakpoint in ClientCnxnSocketNIO.doIO() method). After 30s, we would see session closing logs logged in server code
>> After 1min, remove breakpoint in ClientCnxnSocketNIO.doIO() and resume both Thread 2 and Thread 3
>> After that, resume server thread (thread name would be “SessionTracker”
>> Below Proposals discussed so far (3rd added now for your review)
>> Making lock code watch its lock node it has created
>> Session ID changes after wait exits it suggests the lock node should no longer be trusted and it should be deleted and re-created
>> Leader Latch code is well protected to cover this zookeeper race condition, because Leader Latch code internally handle the connection events (which they use to interrupt latch_acquire_state to reset its latch every time connection is reconnected), means it will explicitly release the latch and recreate new if there is a connection disconnect (may be this can be the approach that lock recipe could use to protect ?) 
>> Many Thanks,
>> Viswa.
>>  
>> From: Jordan Zimmerman <jordan@jordanzimmerman.com <ma...@jordanzimmerman.com>>
>> Date: Tuesday, 20 July 2021 at 21:11
>> To: Viswanathan Rajagopal <viswanathan.rajagopa@workday.com <ma...@workday.com>>
>> Cc: Sean Gibbons <sean.gibbons@workday.com <ma...@workday.com>>, dev@curator.apache.org <ma...@curator.apache.org> <dev@curator.apache.org <ma...@curator.apache.org>>, cammckenzie@apache.org <ma...@apache.org> <cammckenzie@apache.org <ma...@apache.org>>, user@curator.apache.org <ma...@curator.apache.org> <user@curator.apache.org <ma...@curator.apache.org>>, Donal Arundel <donal.arundel@workday.com <ma...@workday.com>>, Francisco Goncalves <francisco.goncalves@workday.com <ma...@workday.com>>, Zak Szadowski <zak.szadowski@workday.com <ma...@workday.com>>, Dandie Beloy <dandie.beloy@workday.com <ma...@workday.com>>, Marcelo Cenerino <marcelo.cenerino@workday.com <ma...@workday.com>>
>> Subject: Re: [External Sender] Double Locking Issue
>> 
>> In our case, it wouldn’t throw any exception because it had gone past “creating lock nodes” and was blocked on wait(), which would only then be interrupted when curator watcher notified on previous sequence node delete event.
>>  
>> So, you're using the version of acquire() without a timeout? In any event, this is a problem. When you receive SUSPENDED you really should interrupt any threads that are waiting on Curator. The Curator docs imply this even though it might not be obvious. This is likely the source of your problems. A simple solution is to use the version of acquire that has a timeout and repeatedly call it until success (though your problem may still occur in this case). Maybe we could improve the lock recipe (or something similar) so that locks inside of acquire are interrupted on a network partition. 
>>  
>> [snip of your remaining thoughtful analysis]
>>  
>> I think there's generally a hole in Curator's lock recipe. The lock code does not watch the lock node it has created. So, another process or (as you found) a race with the server might cause the lock node to disappear underneath the lock instance after it thinks it has the lock. 
>>  
>> One thing we can do is to check the session ID before waiting in LockInternals.internalLockLoop(). If the session ID changes after wait exits it suggests the lock node should no longer be trusted and it should be deleted and re-created. That's 1 though anyway. It would be nice to generate a test/simulation for this case so that it can be properly dealt with.
>>  
>> -Jordan
>> 
>> 
>> On Jul 20, 2021, at 3:53 PM, Viswanathan Rajagopal <viswanathan.rajagopa@workday.com <ma...@workday.com>> wrote:
>>  
>> Thanks Jordan for coming back on this.
>>  
>> Please find my inline comments. Also provided below few additional version info that we are using,
>> Curator version – 2.5.0
>> Zookeeper version – 3.5.3
>>  
>> >  I don't see how this is the case. If the client has received a network partition it shouldn't not consider any locks as being currently held (see the Tech Notes I mentioned in my previous email). If there is a partition during a call to acquire(), acquire() would throw an exception (once the retry policy has expired). BTW, what retry policy are you using?
>>  
>> In our case, it wouldn’t throw any exception because it had gone past “creating lock nodes” and was blocked on wait(), which would only then be interrupted when curator watcher notified on previous sequence node delete event.
>>  
>> > This isn't correct. Curator watches the previous node but lock acquisition is always based on calling ZK's getChildren, sorting them, and seeing if the lock's node is the first node in the sequence. If Ephemeral nodes aren't cleaned up it wouldn't be a problem. Any phantom ephemeral nodes would sort first and prevent Curator from believing it holds the lock.
>>  
>> Totally agreed on your point on lock acquisition. The problem that I was trying to explain is that the network partition occurs while the client is waiting for the lock (in the acquire()) but after the acquire code block has already written out the lock node successfully, so the client code will be blocking in the acquire and not get a connection exception. Once the previous lock node is deleted but not the lock node for this client then this client will leave the acquire thinking it has the lock but actually at this point the ZK server has expired the session and is in the process of deleting our clients lock node.
>>  
>> Trying to categorize the timeline events below to see if that would help us for better understanding,
>> [ CONNECTED ] :
>> Client A & Client B calls acquire creating Node N1 & N2 respectively
>> Client A acquire() -> holds the lock as its Node N1 is first node in sequence
>> Client B acquire() -> created Node N2, watches for previous sequence node (blocked on wait())
>>  
>> [ SUSPENDED CONNECTION OCCURS ] :
>> Network partition happens
>> Curator fires SUSPENDED
>> Client A, on receiving SUSPENDED event, attempts to release the lock
>> Client B, on receiving SUSPENDED event, does nothing as it was not holding any lock (as blocked on acquire() -> wait() )  
>>  
>> [ SESSION EXPIRED ] :
>> Client A attempts to release the lock (with retries)
>> Client B does nothing as it was not holding any lock (and is still blocked on acquire() -> wait() )
>> Server prepared to cleanup previous client session and deleting lock nodes created as part of previous client session
>>  
>> [ RECONNECTED ] :
>> Client A and Client B reconnected to another server with new session id (Note : This happens before server managed to cleanup / delete ephemeral nodes of previous client session)
>> Client A released the lock successfully (means it would delete its lock node N1) and attempts to acquire lock by creating lock node N3 and watches for previous sequence node (N2)
>> Client B who was blocked on acquire() -> wait() would be notified with previous sequence node (N1) deletion -> getChildren, sorting them, and seeing if the lock's node is the first node in the sequence. So, Client B sees its lock node N2 still (which I call it as about to be deleted node by server) and thereby acquires the lock
>>  
>> [ AFTER FEW SECONDS ] :
>> Server managed to delete the ephemeral node N2 as part of previous client session cleanup
>> Client A who was blocked on acquire() -> wait() would be notified with previous sequence node (N2) deletion -> getChildren, sorting them, and seeing if the lock's node is the first node in the sequence and thereby acquires the lock
>> Client B – its local lock thread data went stale (as its lock path N2 not has been deleted by Server)
>>  
>> >  SUSPENDED in Curator means only that the connect has been lost, not that the session has ended.
>> LOST is the state that means the session has ended
>> Be aware of how GCs can affect Curator. See the Tech Note here: https://cwiki.apache.org/confluence/display/CURATOR/TN10 <https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CURATOR_TN10&d=DwMFaQ&c=DS6PUFBBr_KiLo7Sjt3ljp5jaW5k2i9ijVXllEdOozc&r=mwSLlPO0Vtstmu1dce0TMFqf5lUxD2SPdNdc1k4NXjU&m=jQttOc88vMYbwNBFzWUO8TKGSmu_8P6EDmcD7jlrVys&s=BJywW80sEBAxOQEuTDkeP8tx0XX0s01sz44lNyWXIXs&e=>
>> <https://cwiki.apache.org/confluence/display/CURATOR/TN10 <https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CURATOR_TN10&d=DwMFaQ&c=DS6PUFBBr_KiLo7Sjt3ljp5jaW5k2i9ijVXllEdOozc&r=mwSLlPO0Vtstmu1dce0TMFqf5lUxD2SPdNdc1k4NXjU&m=jQttOc88vMYbwNBFzWUO8TKGSmu_8P6EDmcD7jlrVys&s=BJywW80sEBAxOQEuTDkeP8tx0XX0s01sz44lNyWXIXs&e=>>
>> Also read this Tech Note on session handling: https://cwiki.apache.org/confluence/display/CURATOR/TN14 <https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CURATOR_TN14&d=DwMFaQ&c=DS6PUFBBr_KiLo7Sjt3ljp5jaW5k2i9ijVXllEdOozc&r=mwSLlPO0Vtstmu1dce0TMFqf5lUxD2SPdNdc1k4NXjU&m=jQttOc88vMYbwNBFzWUO8TKGSmu_8P6EDmcD7jlrVys&s=UhQP5TNa_OMaYVSJX3b9p7XqWZ0fZtErLQwwbcQUUQA&e=>
>> <https://cwiki.apache.org/confluence/display/CURATOR/TN14 <https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CURATOR_TN14&d=DwMFaQ&c=DS6PUFBBr_KiLo7Sjt3ljp5jaW5k2i9ijVXllEdOozc&r=mwSLlPO0Vtstmu1dce0TMFqf5lUxD2SPdNdc1k4NXjU&m=jQttOc88vMYbwNBFzWUO8TKGSmu_8P6EDmcD7jlrVys&s=UhQP5TNa_OMaYVSJX3b9p7XqWZ0fZtErLQwwbcQUUQA&e=>>
>>  
>> Very good information on the tech notes. Thanks for that. Agreed, it’s always recommended to clear the locks when it sees SUSPENDED event. And we are already following your recommendation. Our application would clear the lock when it sees SUSPENDED event.
>>  
>> To summarize my thoughts,
>> Ephemeral node created by previous session was  still seen by client that reconnected with new session id until server cleans that up the previous session ephemeral node. This could happen if client manage to reconnect with server with new session id before server cleans up the previous session. How it affects the curator lock recipe ? Explanation : The above explained race condition would make acquire() to hold the lock ( as its own lock node still seen ), eventually leading to inconsistent state (i.e. curator local lock state stale) when that lock node is being cleaned up by the server as part of previous session cleanup activities.
>> I am NOT seeing a Curator bug here, but looking out for suggestions / recommendations in handling this zookeeper race condition. Either can a feature be added to Curator to cover this case / any recommendations that clients should follow.
>>  
>> Many Thanks,
>> Viswa.
>>  
>> From: Sean Gibbons <sean.gibbons@workday.com <ma...@workday.com>>
>> Date: Tuesday, 20 July 2021 at 12:19
>> To: Viswanathan Rajagopal <viswanathan.rajagopa@workday.com <ma...@workday.com>>
>> Subject: FW: [External Sender] Re: Double Locking Issue
>> 
>> 
>> 
>> On 20/07/2021, 10:12, "Jordan Zimmerman" <jordan@jordanzimmerman.com <ma...@jordanzimmerman.com>> wrote:
>> 
>>     A few more things...
>> 
>>     > Based on our initial analysis and few test runs, we saw that Curator acquire() method acquires the lock based on “about to be deleted lock node of previous session”. Explanation : Ephemeral node created by previous session was  still seen by client that reconnected with new session id until server cleans that up. If this happens, Curator acquire() would hold the lock.
>> 
>>     This isn't correct. Curator watches the previous node but lock acquisition is always based on calling ZK's getChildren, sorting them, and seeing if the lock's node is the first node in the sequence. If Ephemeral nodes aren't cleaned up it wouldn't be a problem. Any phantom ephemeral nodes would sort first and prevent Curator from believing it holds the lock.
>> 
>>     >  *   On the above mentioned race condition, if client manage to reconnect to server with new session id before server cleans up the ephemeral nodes of client’s previous session,  Curator lock acquire() who is trying to acquire the lock will hold the lock as it still sees the lock node in zookeeper directory. Eventually server would be cleaning up the ephemeral nodes leaving the Curator local lock thread data stale giving the illusion that it still hold the lock while its ephemeral node is gone
>> 
>>     I don't see how this is the case. If the client has received a network partition it shouldn't not consider any locks as being currently held (see the Tech Notes I mentioned in my previous email). If there is a partition during a call to acquire(), acquire() would throw an exception (once the retry policy has expired). BTW, what retry policy are you using?
>> 
>>     So, to reiterate, I don't see how phantom/undeleted ephemeral nodes would cause a problem. The only problem it could case is that a given Curator client takes longer to acquire a lock as it waits for those ephemerals to finally get deleted.
>> 
>>     -JZ
>> 
>>     > On Jul 19, 2021, at 6:45 PM, Viswanathan Rajagopal <viswanathan.rajagopa@workday.com.INVALID <ma...@workday.com.INVALID>> wrote:
>>     > 
>>     > Hi Team,
>>     > 
>>     > Good day.
>>     > 
>>     > Recently came across “Double Locking Issue (i.e. two clients acquiring lock)” using Curator code ( InterProcessMutex lock APIs ) in our application
>>     > 
>>     > Our use case:
>>     > 
>>     >  *   Two clients attempts to acquire the zookeeper lock using Curator InterProcessMutex and whoever owns it would release it once sees the connection disconnect ( on receiving Connection.SUSPENDED / Connection.LOST Curator Connection Events from Connection listener)
>>     > 
>>     > Issue we noticed:
>>     > 
>>     >  *   After session expired & reconnected with new session, both client seems to have acquired the lock. Interesting thing that we found is that one of the clients still holds the lock while its lock node (ephemeral) was gone
>>     > 
>>     > Things we found:
>>     > 
>>     >  *   Based on our initial analysis and few test runs, we saw that Curator acquire() method acquires the lock based on “about to be deleted lock node of previous session”. Explanation : Ephemeral node created by previous session was  still seen by client that reconnected with new session id until server cleans that up. If this happens, Curator acquire() would hold the lock.
>>     > 
>>     > 
>>     > 
>>     >  *   Clearly we could see the race condition (in zookeeper code) between 1). Client reconnecting to server with new session id and 2). server deleting the ephemeral nodes of client’s previous session. We were able to reproduce this issue using the following approach,
>>     >     *   Artificially break the socket connection between client and server for 30s
>>     >     *   Artificially pausing the set of server codes for a min and unpause
>>     > 
>>     > 
>>     >  *   On the above mentioned race condition, if client manage to reconnect to server with new session id before server cleans up the ephemeral nodes of client’s previous session,  Curator lock acquire() who is trying to acquire the lock will hold the lock as it still sees the lock node in zookeeper directory. Eventually server would be cleaning up the ephemeral nodes leaving the Curator local lock thread data stale giving the illusion that it still hold the lock while its ephemeral node is gone
>>     > 
>>     > 
>>     >  *   Timeline events described below for better understanding,
>>     >     *   At t1, Client A and Client B establishes zookeeper session with session id A1 and B1 respectively
>>     >     *   At t2, Client A creates the lock node N1 & acquires the lock
>>     >     *   At t3, Client B creates the lock node N2 & blocked on acquire() to acquire the lock
>>     >     *   At t4, session timed out for both clients & server is about to clean up the old session • Client A trying to release the lock
>>     >     *   At t5, Client A and Client B reconnects to server with new session id A2 and B2 respectively before server deletes the ephemeral node N1 & N2 of previous client session. Client A releases the lock, deleting N1 and trying to acquire it again by creating N3 node and Client B who is blocked on acquire() acquires the lock based on N2 (about to be deleted node created by previous session)
>>     >     *   At  t6, server cleans up the ephemeral node N2 created by Client B’s previous session. Client A acquires the lock with node N3 as its previous sequence N2 gets deleted, whereas Client B who incorrectly acquired the lock at t5 timeline still holds the lock
>>     > 
>>     > 
>>     > Note:
>>     > 
>>     >  *   We are not certain that this race condition that we noticed in zookeeper code is intentional design.
>>     > 
>>     > Questions:
>>     > 
>>     >  *   Given this race condition seen in zookeeper code, we would like to hear your recommendations / suggestions to avoid this issue while using Curator lock code?
>>     > 
>>     > 
>>     > 
>>     >  *   We also see that Interprocess Mutex has the makeRevocable() API that enables application to revoke lock, but that handles Node change event only but not for Node deleted event. I understand that this makes sense to handle Node change event alone, as to enable application user to revoke lock externally from application side. But would it be also to okay to have one for Node delete event, so as application can register the listener for Node delete event. I would like to hear your thoughts.
>>     > 
>>     > 
>>     > Many Thanks,
>>     > Viswa
>> 
>

Re: [External Sender] Double Locking Issue

Posted by Jordan Zimmerman <jo...@jordanzimmerman.com>.

Actually... I'm looking at LockInternals and it already has a watcher set on the previous node before it calls wait(). That watcher will get called when there's a connection problem. It would be pretty easy to add something in that watcher to interrupt the waiting thread. This could be a pretty simple change actually.

Watcher set just before waiting: https://github.com/apache/curator/blob/master/curator-recipes/src/main/java/org/apache/curator/framework/recipes/locks/LockInternals.java#L301 <https://github.com/apache/curator/blob/master/curator-recipes/src/main/java/org/apache/curator/framework/recipes/locks/LockInternals.java#L301>
The Watcher: https://github.com/apache/curator/blob/master/curator-recipes/src/main/java/org/apache/curator/framework/recipes/locks/LockInternals.java#L64 <https://github.com/apache/curator/blob/master/curator-recipes/src/main/java/org/apache/curator/framework/recipes/locks/LockInternals.java#L64>

So, off the top of my head, an AtomicReference<Thread> field could be added to LockInternals. It gets set to the current thread just before waiting. If the Watcher is called with a connection event and the AtomicReference isn't empty, interrupt the thread. This will cause wait() to throw InterruptedException and then the lock node will be deleted.

-Jordan

> On Jul 30, 2021, at 9:11 AM, Jordan Zimmerman <jo...@jordanzimmerman.com> wrote:
> 
> Making lock code watch its lock node it has created
> 
> I don't think this is useful. Once the lock method returns the lock code can't notify the caller/user that the lock node has disappeared. We'd need some kind of additional class "lock watcher" or something that client code would have to periodically call to ensure that it still has the lock. tbh - this has been discussed a lot over the years on the ZooKeeper/Curator channels. Just because you have a ZNode doesn't mean it's 100% valid. You should still practice optimistic locks, etc. But, maybe we can reduce the edge cases with this lock watcher or some other mechanism.
> 
> Session ID changes after wait exits it suggests the lock node should no longer be trusted and it should be deleted and re-created
> 
> This is a simple change. But, I don't know if it would help much. It would be better to interrupt waiting lockers when the connection is lost. If an internal mechanism forced any locks blocked in wait() to throw an exception then those lock threads will already delete their ZNodes and try again. This would be the best solution maybe? So, before the lock code goes to wait() it sets a ConnectionStateListener (or something similar) that will interrupt the thread when there are connection problems.
> 
> Leader Latch code is well protected
> 
> Yes - the leader recipes are better at this. Maybe there's an opportunity to merge/change code. We could consider deprecating the lock recipes in favor of these?
> 
> -Jordan
> 
>> On Jul 29, 2021, at 2:35 AM, Viswanathan Rajagopal <viswanathan.rajagopa@workday.com <ma...@workday.com>> wrote:
>> 
>> Hi Jordan,
>>  
>> Thanks for the suggestions. Much helpful.
>>  
>> Quick summary of my below comments,
>> Added test case simulator to reproduce issue (though artificially)
>> Added a proposed code just to get your review and suggestions to see whether that would work
>> Please find my detailed comments inline,
>>  
>> > I think there's generally a hole in Curator's lock recipe. The lock code does not watch the lock node it has created. So, another process or (as you found) a race with the server might cause the lock node to disappear underneath the lock instance after it thinks it has the lock. One thing we can do is to check the session ID before waiting in LockInternals.internalLockLoop(). If the session ID changes after wait exits it suggests the lock node should no longer be trusted and it should be deleted and re-created. That's 1 though anyway.
>>  
>> Yes, you are right. Currently lock code doesn’t watch the lock node it has created. That’s where we would like to hear your feedback on our proposal mentioned in our first mail thread 
>> “Curator lock code has makeRevocable() API that enables application to revoke lock anytime from application by triggering NODE_CHANGE event through Revoker.attemptToRevoke() utility. 
>> Proposal: Would it be nice to extend makeRevocable() API to handle Node delete event, which would allow application to register the watcher for Node delete event, thereby application can react to Node delete event by revoking the lock ?”. Tried the Code snippet (https://github.com/ViswaNXplore/curator/blob/curator-2.5.0/patch/double_locking_test_simulator/curator-recipes/src/main/java/org/apache/curator/framework/recipes/locks/LockInternals.java#L60 <https://github.com/ViswaNXplore/curator/blob/curator-2.5.0/patch/double_locking_test_simulator/curator-recipes/src/main/java/org/apache/curator/framework/recipes/locks/LockInternals.java#L60>
>> ) to see the approach “Making lock code watch its lock node it has created” works and would like to get your thoughts on this and do you see any other impacts there? May need your advice.
>>  
>> I agree with you on that the session id approach, that would protect us from seeing this issue. The reason why I thought your first approach “making lock code watch it has created” could be more protective is this could protect us from any inconsistencies between original lock node and local lock state.
>>   
>> >  It would be nice to generate a test/simulation for this case so that it can be properly dealt with.
>>  
>> Created a test case simulator https://github.com/ViswaNXplore/curator/blob/curator-2.5.0/patch/double_locking_test_simulator/curator-recipes/src/test/java/org/apache/curator/framework/recipes/CuratorRecipeTest.java <https://github.com/ViswaNXplore/curator/blob/curator-2.5.0/patch/double_locking_test_simulator/curator-recipes/src/test/java/org/apache/curator/framework/recipes/CuratorRecipeTest.java>
>>  
>> Approach #1 (Artificial way of reproducing the issue)
>> Run the test case with (SIMULATE_ISSUE_BY_EXPLICIT_DELETE set to true). This would delete the lock node after a pause just to simulate the state where the zkClient had reconnected and it still happens to see the ephermal node just before the server deletes it since its session has expired, but the node is deleted afterwards by the server.
>> Approach #2 (A little manual interruption needed to reproduce the issue)
>> Run the test case in Debug mode (SIMULATE_ISSUE_BY_EXPLICIT_DELETE set to false)
>> Artificially delaying / pausing the ephemeral lock nodes deletion as part of session cleanup process in server code (ZookeeperServer.close() method)
>> After a pause (say 5s) to make one of the instance to acquire the lock, Artificially break the socket connection between client and server for 30s (by keeping breakpoint in ClientCnxnSocketNIO.doIO() method). After 30s, we would see session closing logs logged in server code
>> After 1min, remove breakpoint in ClientCnxnSocketNIO.doIO() and resume both Thread 2 and Thread 3
>> After that, resume server thread (thread name would be “SessionTracker”
>> Below Proposals discussed so far (3rd added now for your review)
>> Making lock code watch its lock node it has created
>> Session ID changes after wait exits it suggests the lock node should no longer be trusted and it should be deleted and re-created
>> Leader Latch code is well protected to cover this zookeeper race condition, because Leader Latch code internally handle the connection events (which they use to interrupt latch_acquire_state to reset its latch every time connection is reconnected), means it will explicitly release the latch and recreate new if there is a connection disconnect (may be this can be the approach that lock recipe could use to protect ?) 
>> Many Thanks,
>> Viswa.
>>  
>> From: Jordan Zimmerman <jordan@jordanzimmerman.com <ma...@jordanzimmerman.com>>
>> Date: Tuesday, 20 July 2021 at 21:11
>> To: Viswanathan Rajagopal <viswanathan.rajagopa@workday.com <ma...@workday.com>>
>> Cc: Sean Gibbons <sean.gibbons@workday.com <ma...@workday.com>>, dev@curator.apache.org <ma...@curator.apache.org> <dev@curator.apache.org <ma...@curator.apache.org>>, cammckenzie@apache.org <ma...@apache.org> <cammckenzie@apache.org <ma...@apache.org>>, user@curator.apache.org <ma...@curator.apache.org> <user@curator.apache.org <ma...@curator.apache.org>>, Donal Arundel <donal.arundel@workday.com <ma...@workday.com>>, Francisco Goncalves <francisco.goncalves@workday.com <ma...@workday.com>>, Zak Szadowski <zak.szadowski@workday.com <ma...@workday.com>>, Dandie Beloy <dandie.beloy@workday.com <ma...@workday.com>>, Marcelo Cenerino <marcelo.cenerino@workday.com <ma...@workday.com>>
>> Subject: Re: [External Sender] Double Locking Issue
>> 
>> In our case, it wouldn’t throw any exception because it had gone past “creating lock nodes” and was blocked on wait(), which would only then be interrupted when curator watcher notified on previous sequence node delete event.
>>  
>> So, you're using the version of acquire() without a timeout? In any event, this is a problem. When you receive SUSPENDED you really should interrupt any threads that are waiting on Curator. The Curator docs imply this even though it might not be obvious. This is likely the source of your problems. A simple solution is to use the version of acquire that has a timeout and repeatedly call it until success (though your problem may still occur in this case). Maybe we could improve the lock recipe (or something similar) so that locks inside of acquire are interrupted on a network partition. 
>>  
>> [snip of your remaining thoughtful analysis]
>>  
>> I think there's generally a hole in Curator's lock recipe. The lock code does not watch the lock node it has created. So, another process or (as you found) a race with the server might cause the lock node to disappear underneath the lock instance after it thinks it has the lock. 
>>  
>> One thing we can do is to check the session ID before waiting in LockInternals.internalLockLoop(). If the session ID changes after wait exits it suggests the lock node should no longer be trusted and it should be deleted and re-created. That's 1 though anyway. It would be nice to generate a test/simulation for this case so that it can be properly dealt with.
>>  
>> -Jordan
>> 
>> 
>> On Jul 20, 2021, at 3:53 PM, Viswanathan Rajagopal <viswanathan.rajagopa@workday.com <ma...@workday.com>> wrote:
>>  
>> Thanks Jordan for coming back on this.
>>  
>> Please find my inline comments. Also provided below few additional version info that we are using,
>> Curator version – 2.5.0
>> Zookeeper version – 3.5.3
>>  
>> >  I don't see how this is the case. If the client has received a network partition it shouldn't not consider any locks as being currently held (see the Tech Notes I mentioned in my previous email). If there is a partition during a call to acquire(), acquire() would throw an exception (once the retry policy has expired). BTW, what retry policy are you using?
>>  
>> In our case, it wouldn’t throw any exception because it had gone past “creating lock nodes” and was blocked on wait(), which would only then be interrupted when curator watcher notified on previous sequence node delete event.
>>  
>> > This isn't correct. Curator watches the previous node but lock acquisition is always based on calling ZK's getChildren, sorting them, and seeing if the lock's node is the first node in the sequence. If Ephemeral nodes aren't cleaned up it wouldn't be a problem. Any phantom ephemeral nodes would sort first and prevent Curator from believing it holds the lock.
>>  
>> Totally agreed on your point on lock acquisition. The problem that I was trying to explain is that the network partition occurs while the client is waiting for the lock (in the acquire()) but after the acquire code block has already written out the lock node successfully, so the client code will be blocking in the acquire and not get a connection exception. Once the previous lock node is deleted but not the lock node for this client then this client will leave the acquire thinking it has the lock but actually at this point the ZK server has expired the session and is in the process of deleting our clients lock node.
>>  
>> Trying to categorize the timeline events below to see if that would help us for better understanding,
>> [ CONNECTED ] :
>> Client A & Client B calls acquire creating Node N1 & N2 respectively
>> Client A acquire() -> holds the lock as its Node N1 is first node in sequence
>> Client B acquire() -> created Node N2, watches for previous sequence node (blocked on wait())
>>  
>> [ SUSPENDED CONNECTION OCCURS ] :
>> Network partition happens
>> Curator fires SUSPENDED
>> Client A, on receiving SUSPENDED event, attempts to release the lock
>> Client B, on receiving SUSPENDED event, does nothing as it was not holding any lock (as blocked on acquire() -> wait() )  
>>  
>> [ SESSION EXPIRED ] :
>> Client A attempts to release the lock (with retries)
>> Client B does nothing as it was not holding any lock (and is still blocked on acquire() -> wait() )
>> Server prepared to cleanup previous client session and deleting lock nodes created as part of previous client session
>>  
>> [ RECONNECTED ] :
>> Client A and Client B reconnected to another server with new session id (Note : This happens before server managed to cleanup / delete ephemeral nodes of previous client session)
>> Client A released the lock successfully (means it would delete its lock node N1) and attempts to acquire lock by creating lock node N3 and watches for previous sequence node (N2)
>> Client B who was blocked on acquire() -> wait() would be notified with previous sequence node (N1) deletion -> getChildren, sorting them, and seeing if the lock's node is the first node in the sequence. So, Client B sees its lock node N2 still (which I call it as about to be deleted node by server) and thereby acquires the lock
>>  
>> [ AFTER FEW SECONDS ] :
>> Server managed to delete the ephemeral node N2 as part of previous client session cleanup
>> Client A who was blocked on acquire() -> wait() would be notified with previous sequence node (N2) deletion -> getChildren, sorting them, and seeing if the lock's node is the first node in the sequence and thereby acquires the lock
>> Client B – its local lock thread data went stale (as its lock path N2 not has been deleted by Server)
>>  
>> >  SUSPENDED in Curator means only that the connect has been lost, not that the session has ended.
>> LOST is the state that means the session has ended
>> Be aware of how GCs can affect Curator. See the Tech Note here: https://cwiki.apache.org/confluence/display/CURATOR/TN10 <https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CURATOR_TN10&d=DwMFaQ&c=DS6PUFBBr_KiLo7Sjt3ljp5jaW5k2i9ijVXllEdOozc&r=mwSLlPO0Vtstmu1dce0TMFqf5lUxD2SPdNdc1k4NXjU&m=jQttOc88vMYbwNBFzWUO8TKGSmu_8P6EDmcD7jlrVys&s=BJywW80sEBAxOQEuTDkeP8tx0XX0s01sz44lNyWXIXs&e=>
>> <https://cwiki.apache.org/confluence/display/CURATOR/TN10 <https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CURATOR_TN10&d=DwMFaQ&c=DS6PUFBBr_KiLo7Sjt3ljp5jaW5k2i9ijVXllEdOozc&r=mwSLlPO0Vtstmu1dce0TMFqf5lUxD2SPdNdc1k4NXjU&m=jQttOc88vMYbwNBFzWUO8TKGSmu_8P6EDmcD7jlrVys&s=BJywW80sEBAxOQEuTDkeP8tx0XX0s01sz44lNyWXIXs&e=>>
>> Also read this Tech Note on session handling: https://cwiki.apache.org/confluence/display/CURATOR/TN14 <https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CURATOR_TN14&d=DwMFaQ&c=DS6PUFBBr_KiLo7Sjt3ljp5jaW5k2i9ijVXllEdOozc&r=mwSLlPO0Vtstmu1dce0TMFqf5lUxD2SPdNdc1k4NXjU&m=jQttOc88vMYbwNBFzWUO8TKGSmu_8P6EDmcD7jlrVys&s=UhQP5TNa_OMaYVSJX3b9p7XqWZ0fZtErLQwwbcQUUQA&e=>
>> <https://cwiki.apache.org/confluence/display/CURATOR/TN14 <https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CURATOR_TN14&d=DwMFaQ&c=DS6PUFBBr_KiLo7Sjt3ljp5jaW5k2i9ijVXllEdOozc&r=mwSLlPO0Vtstmu1dce0TMFqf5lUxD2SPdNdc1k4NXjU&m=jQttOc88vMYbwNBFzWUO8TKGSmu_8P6EDmcD7jlrVys&s=UhQP5TNa_OMaYVSJX3b9p7XqWZ0fZtErLQwwbcQUUQA&e=>>
>>  
>> Very good information on the tech notes. Thanks for that. Agreed, it’s always recommended to clear the locks when it sees SUSPENDED event. And we are already following your recommendation. Our application would clear the lock when it sees SUSPENDED event.
>>  
>> To summarize my thoughts,
>> Ephemeral node created by previous session was  still seen by client that reconnected with new session id until server cleans that up the previous session ephemeral node. This could happen if client manage to reconnect with server with new session id before server cleans up the previous session. How it affects the curator lock recipe ? Explanation : The above explained race condition would make acquire() to hold the lock ( as its own lock node still seen ), eventually leading to inconsistent state (i.e. curator local lock state stale) when that lock node is being cleaned up by the server as part of previous session cleanup activities.
>> I am NOT seeing a Curator bug here, but looking out for suggestions / recommendations in handling this zookeeper race condition. Either can a feature be added to Curator to cover this case / any recommendations that clients should follow.
>>  
>> Many Thanks,
>> Viswa.
>>  
>> From: Sean Gibbons <sean.gibbons@workday.com <ma...@workday.com>>
>> Date: Tuesday, 20 July 2021 at 12:19
>> To: Viswanathan Rajagopal <viswanathan.rajagopa@workday.com <ma...@workday.com>>
>> Subject: FW: [External Sender] Re: Double Locking Issue
>> 
>> 
>> 
>> On 20/07/2021, 10:12, "Jordan Zimmerman" <jordan@jordanzimmerman.com <ma...@jordanzimmerman.com>> wrote:
>> 
>>     A few more things...
>> 
>>     > Based on our initial analysis and few test runs, we saw that Curator acquire() method acquires the lock based on “about to be deleted lock node of previous session”. Explanation : Ephemeral node created by previous session was  still seen by client that reconnected with new session id until server cleans that up. If this happens, Curator acquire() would hold the lock.
>> 
>>     This isn't correct. Curator watches the previous node but lock acquisition is always based on calling ZK's getChildren, sorting them, and seeing if the lock's node is the first node in the sequence. If Ephemeral nodes aren't cleaned up it wouldn't be a problem. Any phantom ephemeral nodes would sort first and prevent Curator from believing it holds the lock.
>> 
>>     >  *   On the above mentioned race condition, if client manage to reconnect to server with new session id before server cleans up the ephemeral nodes of client’s previous session,  Curator lock acquire() who is trying to acquire the lock will hold the lock as it still sees the lock node in zookeeper directory. Eventually server would be cleaning up the ephemeral nodes leaving the Curator local lock thread data stale giving the illusion that it still hold the lock while its ephemeral node is gone
>> 
>>     I don't see how this is the case. If the client has received a network partition it shouldn't not consider any locks as being currently held (see the Tech Notes I mentioned in my previous email). If there is a partition during a call to acquire(), acquire() would throw an exception (once the retry policy has expired). BTW, what retry policy are you using?
>> 
>>     So, to reiterate, I don't see how phantom/undeleted ephemeral nodes would cause a problem. The only problem it could case is that a given Curator client takes longer to acquire a lock as it waits for those ephemerals to finally get deleted.
>> 
>>     -JZ
>> 
>>     > On Jul 19, 2021, at 6:45 PM, Viswanathan Rajagopal <viswanathan.rajagopa@workday.com.INVALID <ma...@workday.com.INVALID>> wrote:
>>     > 
>>     > Hi Team,
>>     > 
>>     > Good day.
>>     > 
>>     > Recently came across “Double Locking Issue (i.e. two clients acquiring lock)” using Curator code ( InterProcessMutex lock APIs ) in our application
>>     > 
>>     > Our use case:
>>     > 
>>     >  *   Two clients attempts to acquire the zookeeper lock using Curator InterProcessMutex and whoever owns it would release it once sees the connection disconnect ( on receiving Connection.SUSPENDED / Connection.LOST Curator Connection Events from Connection listener)
>>     > 
>>     > Issue we noticed:
>>     > 
>>     >  *   After session expired & reconnected with new session, both client seems to have acquired the lock. Interesting thing that we found is that one of the clients still holds the lock while its lock node (ephemeral) was gone
>>     > 
>>     > Things we found:
>>     > 
>>     >  *   Based on our initial analysis and few test runs, we saw that Curator acquire() method acquires the lock based on “about to be deleted lock node of previous session”. Explanation : Ephemeral node created by previous session was  still seen by client that reconnected with new session id until server cleans that up. If this happens, Curator acquire() would hold the lock.
>>     > 
>>     > 
>>     > 
>>     >  *   Clearly we could see the race condition (in zookeeper code) between 1). Client reconnecting to server with new session id and 2). server deleting the ephemeral nodes of client’s previous session. We were able to reproduce this issue using the following approach,
>>     >     *   Artificially break the socket connection between client and server for 30s
>>     >     *   Artificially pausing the set of server codes for a min and unpause
>>     > 
>>     > 
>>     >  *   On the above mentioned race condition, if client manage to reconnect to server with new session id before server cleans up the ephemeral nodes of client’s previous session,  Curator lock acquire() who is trying to acquire the lock will hold the lock as it still sees the lock node in zookeeper directory. Eventually server would be cleaning up the ephemeral nodes leaving the Curator local lock thread data stale giving the illusion that it still hold the lock while its ephemeral node is gone
>>     > 
>>     > 
>>     >  *   Timeline events described below for better understanding,
>>     >     *   At t1, Client A and Client B establishes zookeeper session with session id A1 and B1 respectively
>>     >     *   At t2, Client A creates the lock node N1 & acquires the lock
>>     >     *   At t3, Client B creates the lock node N2 & blocked on acquire() to acquire the lock
>>     >     *   At t4, session timed out for both clients & server is about to clean up the old session • Client A trying to release the lock
>>     >     *   At t5, Client A and Client B reconnects to server with new session id A2 and B2 respectively before server deletes the ephemeral node N1 & N2 of previous client session. Client A releases the lock, deleting N1 and trying to acquire it again by creating N3 node and Client B who is blocked on acquire() acquires the lock based on N2 (about to be deleted node created by previous session)
>>     >     *   At  t6, server cleans up the ephemeral node N2 created by Client B’s previous session. Client A acquires the lock with node N3 as its previous sequence N2 gets deleted, whereas Client B who incorrectly acquired the lock at t5 timeline still holds the lock
>>     > 
>>     > 
>>     > Note:
>>     > 
>>     >  *   We are not certain that this race condition that we noticed in zookeeper code is intentional design.
>>     > 
>>     > Questions:
>>     > 
>>     >  *   Given this race condition seen in zookeeper code, we would like to hear your recommendations / suggestions to avoid this issue while using Curator lock code?
>>     > 
>>     > 
>>     > 
>>     >  *   We also see that Interprocess Mutex has the makeRevocable() API that enables application to revoke lock, but that handles Node change event only but not for Node deleted event. I understand that this makes sense to handle Node change event alone, as to enable application user to revoke lock externally from application side. But would it be also to okay to have one for Node delete event, so as application can register the listener for Node delete event. I would like to hear your thoughts.
>>     > 
>>     > 
>>     > Many Thanks,
>>     > Viswa
>> 
>

Re: [External Sender] Double Locking Issue

Posted by Jordan Zimmerman <jo...@jordanzimmerman.com>.

Making lock code watch its lock node it has created

I don't think this is useful. Once the lock method returns the lock code can't notify the caller/user that the lock node has disappeared. We'd need some kind of additional class "lock watcher" or something that client code would have to periodically call to ensure that it still has the lock. tbh - this has been discussed a lot over the years on the ZooKeeper/Curator channels. Just because you have a ZNode doesn't mean it's 100% valid. You should still practice optimistic locks, etc. But, maybe we can reduce the edge cases with this lock watcher or some other mechanism.

Session ID changes after wait exits it suggests the lock node should no longer be trusted and it should be deleted and re-created

This is a simple change. But, I don't know if it would help much. It would be better to interrupt waiting lockers when the connection is lost. If an internal mechanism forced any locks blocked in wait() to throw an exception then those lock threads will already delete their ZNodes and try again. This would be the best solution maybe? So, before the lock code goes to wait() it sets a ConnectionStateListener (or something similar) that will interrupt the thread when there are connection problems.

Leader Latch code is well protected

Yes - the leader recipes are better at this. Maybe there's an opportunity to merge/change code. We could consider deprecating the lock recipes in favor of these?

-Jordan

> On Jul 29, 2021, at 2:35 AM, Viswanathan Rajagopal <vi...@workday.com> wrote:
> 
> Hi Jordan,
>  
> Thanks for the suggestions. Much helpful.
>  
> Quick summary of my below comments,
> Added test case simulator to reproduce issue (though artificially)
> Added a proposed code just to get your review and suggestions to see whether that would work
> Please find my detailed comments inline,
>  
> > I think there's generally a hole in Curator's lock recipe. The lock code does not watch the lock node it has created. So, another process or (as you found) a race with the server might cause the lock node to disappear underneath the lock instance after it thinks it has the lock. One thing we can do is to check the session ID before waiting in LockInternals.internalLockLoop(). If the session ID changes after wait exits it suggests the lock node should no longer be trusted and it should be deleted and re-created. That's 1 though anyway.
>  
> Yes, you are right. Currently lock code doesn’t watch the lock node it has created. That’s where we would like to hear your feedback on our proposal mentioned in our first mail thread 
> “Curator lock code has makeRevocable() API that enables application to revoke lock anytime from application by triggering NODE_CHANGE event through Revoker.attemptToRevoke() utility. 
> Proposal: Would it be nice to extend makeRevocable() API to handle Node delete event, which would allow application to register the watcher for Node delete event, thereby application can react to Node delete event by revoking the lock ?”. Tried the Code snippet (https://github.com/ViswaNXplore/curator/blob/curator-2.5.0/patch/double_locking_test_simulator/curator-recipes/src/main/java/org/apache/curator/framework/recipes/locks/LockInternals.java#L60 <https://github.com/ViswaNXplore/curator/blob/curator-2.5.0/patch/double_locking_test_simulator/curator-recipes/src/main/java/org/apache/curator/framework/recipes/locks/LockInternals.java#L60>
> ) to see the approach “Making lock code watch its lock node it has created” works and would like to get your thoughts on this and do you see any other impacts there? May need your advice.
>  
> I agree with you on that the session id approach, that would protect us from seeing this issue. The reason why I thought your first approach “making lock code watch it has created” could be more protective is this could protect us from any inconsistencies between original lock node and local lock state.
>   
> >  It would be nice to generate a test/simulation for this case so that it can be properly dealt with.
>  
> Created a test case simulator https://github.com/ViswaNXplore/curator/blob/curator-2.5.0/patch/double_locking_test_simulator/curator-recipes/src/test/java/org/apache/curator/framework/recipes/CuratorRecipeTest.java <https://github.com/ViswaNXplore/curator/blob/curator-2.5.0/patch/double_locking_test_simulator/curator-recipes/src/test/java/org/apache/curator/framework/recipes/CuratorRecipeTest.java>
>  
> Approach #1 (Artificial way of reproducing the issue)
> Run the test case with (SIMULATE_ISSUE_BY_EXPLICIT_DELETE set to true). This would delete the lock node after a pause just to simulate the state where the zkClient had reconnected and it still happens to see the ephermal node just before the server deletes it since its session has expired, but the node is deleted afterwards by the server.
> Approach #2 (A little manual interruption needed to reproduce the issue)
> Run the test case in Debug mode (SIMULATE_ISSUE_BY_EXPLICIT_DELETE set to false)
> Artificially delaying / pausing the ephemeral lock nodes deletion as part of session cleanup process in server code (ZookeeperServer.close() method)
> After a pause (say 5s) to make one of the instance to acquire the lock, Artificially break the socket connection between client and server for 30s (by keeping breakpoint in ClientCnxnSocketNIO.doIO() method). After 30s, we would see session closing logs logged in server code
> After 1min, remove breakpoint in ClientCnxnSocketNIO.doIO() and resume both Thread 2 and Thread 3
> After that, resume server thread (thread name would be “SessionTracker”
> Below Proposals discussed so far (3rd added now for your review)
> Making lock code watch its lock node it has created
> Session ID changes after wait exits it suggests the lock node should no longer be trusted and it should be deleted and re-created
> Leader Latch code is well protected to cover this zookeeper race condition, because Leader Latch code internally handle the connection events (which they use to interrupt latch_acquire_state to reset its latch every time connection is reconnected), means it will explicitly release the latch and recreate new if there is a connection disconnect (may be this can be the approach that lock recipe could use to protect ?) 
> Many Thanks,
> Viswa.
>  
> From: Jordan Zimmerman <jo...@jordanzimmerman.com>
> Date: Tuesday, 20 July 2021 at 21:11
> To: Viswanathan Rajagopal <vi...@workday.com>
> Cc: Sean Gibbons <se...@workday.com>, dev@curator.apache.org <de...@curator.apache.org>, cammckenzie@apache.org <ca...@apache.org>, user@curator.apache.org <us...@curator.apache.org>, Donal Arundel <do...@workday.com>, Francisco Goncalves <fr...@workday.com>, Zak Szadowski <za...@workday.com>, Dandie Beloy <da...@workday.com>, Marcelo Cenerino <ma...@workday.com>
> Subject: Re: [External Sender] Double Locking Issue
> 
> In our case, it wouldn’t throw any exception because it had gone past “creating lock nodes” and was blocked on wait(), which would only then be interrupted when curator watcher notified on previous sequence node delete event.
>  
> So, you're using the version of acquire() without a timeout? In any event, this is a problem. When you receive SUSPENDED you really should interrupt any threads that are waiting on Curator. The Curator docs imply this even though it might not be obvious. This is likely the source of your problems. A simple solution is to use the version of acquire that has a timeout and repeatedly call it until success (though your problem may still occur in this case). Maybe we could improve the lock recipe (or something similar) so that locks inside of acquire are interrupted on a network partition. 
>  
> [snip of your remaining thoughtful analysis]
>  
> I think there's generally a hole in Curator's lock recipe. The lock code does not watch the lock node it has created. So, another process or (as you found) a race with the server might cause the lock node to disappear underneath the lock instance after it thinks it has the lock. 
>  
> One thing we can do is to check the session ID before waiting in LockInternals.internalLockLoop(). If the session ID changes after wait exits it suggests the lock node should no longer be trusted and it should be deleted and re-created. That's 1 though anyway. It would be nice to generate a test/simulation for this case so that it can be properly dealt with.
>  
> -Jordan
> 
> 
> On Jul 20, 2021, at 3:53 PM, Viswanathan Rajagopal <viswanathan.rajagopa@workday.com <ma...@workday.com>> wrote:
>  
> Thanks Jordan for coming back on this.
>  
> Please find my inline comments. Also provided below few additional version info that we are using,
> Curator version – 2.5.0
> Zookeeper version – 3.5.3
>  
> >  I don't see how this is the case. If the client has received a network partition it shouldn't not consider any locks as being currently held (see the Tech Notes I mentioned in my previous email). If there is a partition during a call to acquire(), acquire() would throw an exception (once the retry policy has expired). BTW, what retry policy are you using?
>  
> In our case, it wouldn’t throw any exception because it had gone past “creating lock nodes” and was blocked on wait(), which would only then be interrupted when curator watcher notified on previous sequence node delete event.
>  
> > This isn't correct. Curator watches the previous node but lock acquisition is always based on calling ZK's getChildren, sorting them, and seeing if the lock's node is the first node in the sequence. If Ephemeral nodes aren't cleaned up it wouldn't be a problem. Any phantom ephemeral nodes would sort first and prevent Curator from believing it holds the lock.
>  
> Totally agreed on your point on lock acquisition. The problem that I was trying to explain is that the network partition occurs while the client is waiting for the lock (in the acquire()) but after the acquire code block has already written out the lock node successfully, so the client code will be blocking in the acquire and not get a connection exception. Once the previous lock node is deleted but not the lock node for this client then this client will leave the acquire thinking it has the lock but actually at this point the ZK server has expired the session and is in the process of deleting our clients lock node.
>  
> Trying to categorize the timeline events below to see if that would help us for better understanding,
> [ CONNECTED ] :
> Client A & Client B calls acquire creating Node N1 & N2 respectively
> Client A acquire() -> holds the lock as its Node N1 is first node in sequence
> Client B acquire() -> created Node N2, watches for previous sequence node (blocked on wait())
>  
> [ SUSPENDED CONNECTION OCCURS ] :
> Network partition happens
> Curator fires SUSPENDED
> Client A, on receiving SUSPENDED event, attempts to release the lock
> Client B, on receiving SUSPENDED event, does nothing as it was not holding any lock (as blocked on acquire() -> wait() )  
>  
> [ SESSION EXPIRED ] :
> Client A attempts to release the lock (with retries)
> Client B does nothing as it was not holding any lock (and is still blocked on acquire() -> wait() )
> Server prepared to cleanup previous client session and deleting lock nodes created as part of previous client session
>  
> [ RECONNECTED ] :
> Client A and Client B reconnected to another server with new session id (Note : This happens before server managed to cleanup / delete ephemeral nodes of previous client session)
> Client A released the lock successfully (means it would delete its lock node N1) and attempts to acquire lock by creating lock node N3 and watches for previous sequence node (N2)
> Client B who was blocked on acquire() -> wait() would be notified with previous sequence node (N1) deletion -> getChildren, sorting them, and seeing if the lock's node is the first node in the sequence. So, Client B sees its lock node N2 still (which I call it as about to be deleted node by server) and thereby acquires the lock
>  
> [ AFTER FEW SECONDS ] :
> Server managed to delete the ephemeral node N2 as part of previous client session cleanup
> Client A who was blocked on acquire() -> wait() would be notified with previous sequence node (N2) deletion -> getChildren, sorting them, and seeing if the lock's node is the first node in the sequence and thereby acquires the lock
> Client B – its local lock thread data went stale (as its lock path N2 not has been deleted by Server)
>  
> >  SUSPENDED in Curator means only that the connect has been lost, not that the session has ended.
> LOST is the state that means the session has ended
> Be aware of how GCs can affect Curator. See the Tech Note here: https://cwiki.apache.org/confluence/display/CURATOR/TN10 <https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CURATOR_TN10&d=DwMFaQ&c=DS6PUFBBr_KiLo7Sjt3ljp5jaW5k2i9ijVXllEdOozc&r=mwSLlPO0Vtstmu1dce0TMFqf5lUxD2SPdNdc1k4NXjU&m=jQttOc88vMYbwNBFzWUO8TKGSmu_8P6EDmcD7jlrVys&s=BJywW80sEBAxOQEuTDkeP8tx0XX0s01sz44lNyWXIXs&e=>
> <https://cwiki.apache.org/confluence/display/CURATOR/TN10 <https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CURATOR_TN10&d=DwMFaQ&c=DS6PUFBBr_KiLo7Sjt3ljp5jaW5k2i9ijVXllEdOozc&r=mwSLlPO0Vtstmu1dce0TMFqf5lUxD2SPdNdc1k4NXjU&m=jQttOc88vMYbwNBFzWUO8TKGSmu_8P6EDmcD7jlrVys&s=BJywW80sEBAxOQEuTDkeP8tx0XX0s01sz44lNyWXIXs&e=>>
> Also read this Tech Note on session handling: https://cwiki.apache.org/confluence/display/CURATOR/TN14 <https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CURATOR_TN14&d=DwMFaQ&c=DS6PUFBBr_KiLo7Sjt3ljp5jaW5k2i9ijVXllEdOozc&r=mwSLlPO0Vtstmu1dce0TMFqf5lUxD2SPdNdc1k4NXjU&m=jQttOc88vMYbwNBFzWUO8TKGSmu_8P6EDmcD7jlrVys&s=UhQP5TNa_OMaYVSJX3b9p7XqWZ0fZtErLQwwbcQUUQA&e=>
> <https://cwiki.apache.org/confluence/display/CURATOR/TN14 <https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CURATOR_TN14&d=DwMFaQ&c=DS6PUFBBr_KiLo7Sjt3ljp5jaW5k2i9ijVXllEdOozc&r=mwSLlPO0Vtstmu1dce0TMFqf5lUxD2SPdNdc1k4NXjU&m=jQttOc88vMYbwNBFzWUO8TKGSmu_8P6EDmcD7jlrVys&s=UhQP5TNa_OMaYVSJX3b9p7XqWZ0fZtErLQwwbcQUUQA&e=>>
>  
> Very good information on the tech notes. Thanks for that. Agreed, it’s always recommended to clear the locks when it sees SUSPENDED event. And we are already following your recommendation. Our application would clear the lock when it sees SUSPENDED event.
>  
> To summarize my thoughts,
> Ephemeral node created by previous session was  still seen by client that reconnected with new session id until server cleans that up the previous session ephemeral node. This could happen if client manage to reconnect with server with new session id before server cleans up the previous session. How it affects the curator lock recipe ? Explanation : The above explained race condition would make acquire() to hold the lock ( as its own lock node still seen ), eventually leading to inconsistent state (i.e. curator local lock state stale) when that lock node is being cleaned up by the server as part of previous session cleanup activities.
> I am NOT seeing a Curator bug here, but looking out for suggestions / recommendations in handling this zookeeper race condition. Either can a feature be added to Curator to cover this case / any recommendations that clients should follow.
>  
> Many Thanks,
> Viswa.
>  
> From: Sean Gibbons <sean.gibbons@workday.com <ma...@workday.com>>
> Date: Tuesday, 20 July 2021 at 12:19
> To: Viswanathan Rajagopal <viswanathan.rajagopa@workday.com <ma...@workday.com>>
> Subject: FW: [External Sender] Re: Double Locking Issue
> 
> 
> 
> On 20/07/2021, 10:12, "Jordan Zimmerman" <jordan@jordanzimmerman.com <ma...@jordanzimmerman.com>> wrote:
> 
>     A few more things...
> 
>     > Based on our initial analysis and few test runs, we saw that Curator acquire() method acquires the lock based on “about to be deleted lock node of previous session”. Explanation : Ephemeral node created by previous session was  still seen by client that reconnected with new session id until server cleans that up. If this happens, Curator acquire() would hold the lock.
> 
>     This isn't correct. Curator watches the previous node but lock acquisition is always based on calling ZK's getChildren, sorting them, and seeing if the lock's node is the first node in the sequence. If Ephemeral nodes aren't cleaned up it wouldn't be a problem. Any phantom ephemeral nodes would sort first and prevent Curator from believing it holds the lock.
> 
>     >  *   On the above mentioned race condition, if client manage to reconnect to server with new session id before server cleans up the ephemeral nodes of client’s previous session,  Curator lock acquire() who is trying to acquire the lock will hold the lock as it still sees the lock node in zookeeper directory. Eventually server would be cleaning up the ephemeral nodes leaving the Curator local lock thread data stale giving the illusion that it still hold the lock while its ephemeral node is gone
> 
>     I don't see how this is the case. If the client has received a network partition it shouldn't not consider any locks as being currently held (see the Tech Notes I mentioned in my previous email). If there is a partition during a call to acquire(), acquire() would throw an exception (once the retry policy has expired). BTW, what retry policy are you using?
> 
>     So, to reiterate, I don't see how phantom/undeleted ephemeral nodes would cause a problem. The only problem it could case is that a given Curator client takes longer to acquire a lock as it waits for those ephemerals to finally get deleted.
> 
>     -JZ
> 
>     > On Jul 19, 2021, at 6:45 PM, Viswanathan Rajagopal <viswanathan.rajagopa@workday.com.INVALID <ma...@workday.com.INVALID>> wrote:
>     > 
>     > Hi Team,
>     > 
>     > Good day.
>     > 
>     > Recently came across “Double Locking Issue (i.e. two clients acquiring lock)” using Curator code ( InterProcessMutex lock APIs ) in our application
>     > 
>     > Our use case:
>     > 
>     >  *   Two clients attempts to acquire the zookeeper lock using Curator InterProcessMutex and whoever owns it would release it once sees the connection disconnect ( on receiving Connection.SUSPENDED / Connection.LOST Curator Connection Events from Connection listener)
>     > 
>     > Issue we noticed:
>     > 
>     >  *   After session expired & reconnected with new session, both client seems to have acquired the lock. Interesting thing that we found is that one of the clients still holds the lock while its lock node (ephemeral) was gone
>     > 
>     > Things we found:
>     > 
>     >  *   Based on our initial analysis and few test runs, we saw that Curator acquire() method acquires the lock based on “about to be deleted lock node of previous session”. Explanation : Ephemeral node created by previous session was  still seen by client that reconnected with new session id until server cleans that up. If this happens, Curator acquire() would hold the lock.
>     > 
>     > 
>     > 
>     >  *   Clearly we could see the race condition (in zookeeper code) between 1). Client reconnecting to server with new session id and 2). server deleting the ephemeral nodes of client’s previous session. We were able to reproduce this issue using the following approach,
>     >     *   Artificially break the socket connection between client and server for 30s
>     >     *   Artificially pausing the set of server codes for a min and unpause
>     > 
>     > 
>     >  *   On the above mentioned race condition, if client manage to reconnect to server with new session id before server cleans up the ephemeral nodes of client’s previous session,  Curator lock acquire() who is trying to acquire the lock will hold the lock as it still sees the lock node in zookeeper directory. Eventually server would be cleaning up the ephemeral nodes leaving the Curator local lock thread data stale giving the illusion that it still hold the lock while its ephemeral node is gone
>     > 
>     > 
>     >  *   Timeline events described below for better understanding,
>     >     *   At t1, Client A and Client B establishes zookeeper session with session id A1 and B1 respectively
>     >     *   At t2, Client A creates the lock node N1 & acquires the lock
>     >     *   At t3, Client B creates the lock node N2 & blocked on acquire() to acquire the lock
>     >     *   At t4, session timed out for both clients & server is about to clean up the old session • Client A trying to release the lock
>     >     *   At t5, Client A and Client B reconnects to server with new session id A2 and B2 respectively before server deletes the ephemeral node N1 & N2 of previous client session. Client A releases the lock, deleting N1 and trying to acquire it again by creating N3 node and Client B who is blocked on acquire() acquires the lock based on N2 (about to be deleted node created by previous session)
>     >     *   At  t6, server cleans up the ephemeral node N2 created by Client B’s previous session. Client A acquires the lock with node N3 as its previous sequence N2 gets deleted, whereas Client B who incorrectly acquired the lock at t5 timeline still holds the lock
>     > 
>     > 
>     > Note:
>     > 
>     >  *   We are not certain that this race condition that we noticed in zookeeper code is intentional design.
>     > 
>     > Questions:
>     > 
>     >  *   Given this race condition seen in zookeeper code, we would like to hear your recommendations / suggestions to avoid this issue while using Curator lock code?
>     > 
>     > 
>     > 
>     >  *   We also see that Interprocess Mutex has the makeRevocable() API that enables application to revoke lock, but that handles Node change event only but not for Node deleted event. I understand that this makes sense to handle Node change event alone, as to enable application user to revoke lock externally from application side. But would it be also to okay to have one for Node delete event, so as application can register the listener for Node delete event. I would like to hear your thoughts.
>     > 
>     > 
>     > Many Thanks,
>     > Viswa
>

Re: [External Sender] Double Locking Issue

Posted by Jordan Zimmerman <jo...@jordanzimmerman.com>.

Making lock code watch its lock node it has created

I don't think this is useful. Once the lock method returns the lock code can't notify the caller/user that the lock node has disappeared. We'd need some kind of additional class "lock watcher" or something that client code would have to periodically call to ensure that it still has the lock. tbh - this has been discussed a lot over the years on the ZooKeeper/Curator channels. Just because you have a ZNode doesn't mean it's 100% valid. You should still practice optimistic locks, etc. But, maybe we can reduce the edge cases with this lock watcher or some other mechanism.

Session ID changes after wait exits it suggests the lock node should no longer be trusted and it should be deleted and re-created

This is a simple change. But, I don't know if it would help much. It would be better to interrupt waiting lockers when the connection is lost. If an internal mechanism forced any locks blocked in wait() to throw an exception then those lock threads will already delete their ZNodes and try again. This would be the best solution maybe? So, before the lock code goes to wait() it sets a ConnectionStateListener (or something similar) that will interrupt the thread when there are connection problems.

Leader Latch code is well protected

Yes - the leader recipes are better at this. Maybe there's an opportunity to merge/change code. We could consider deprecating the lock recipes in favor of these?

-Jordan

> On Jul 29, 2021, at 2:35 AM, Viswanathan Rajagopal <vi...@workday.com> wrote:
> 
> Hi Jordan,
>  
> Thanks for the suggestions. Much helpful.
>  
> Quick summary of my below comments,
> Added test case simulator to reproduce issue (though artificially)
> Added a proposed code just to get your review and suggestions to see whether that would work
> Please find my detailed comments inline,
>  
> > I think there's generally a hole in Curator's lock recipe. The lock code does not watch the lock node it has created. So, another process or (as you found) a race with the server might cause the lock node to disappear underneath the lock instance after it thinks it has the lock. One thing we can do is to check the session ID before waiting in LockInternals.internalLockLoop(). If the session ID changes after wait exits it suggests the lock node should no longer be trusted and it should be deleted and re-created. That's 1 though anyway.
>  
> Yes, you are right. Currently lock code doesn’t watch the lock node it has created. That’s where we would like to hear your feedback on our proposal mentioned in our first mail thread 
> “Curator lock code has makeRevocable() API that enables application to revoke lock anytime from application by triggering NODE_CHANGE event through Revoker.attemptToRevoke() utility. 
> Proposal: Would it be nice to extend makeRevocable() API to handle Node delete event, which would allow application to register the watcher for Node delete event, thereby application can react to Node delete event by revoking the lock ?”. Tried the Code snippet (https://github.com/ViswaNXplore/curator/blob/curator-2.5.0/patch/double_locking_test_simulator/curator-recipes/src/main/java/org/apache/curator/framework/recipes/locks/LockInternals.java#L60 <https://github.com/ViswaNXplore/curator/blob/curator-2.5.0/patch/double_locking_test_simulator/curator-recipes/src/main/java/org/apache/curator/framework/recipes/locks/LockInternals.java#L60>
> ) to see the approach “Making lock code watch its lock node it has created” works and would like to get your thoughts on this and do you see any other impacts there? May need your advice.
>  
> I agree with you on that the session id approach, that would protect us from seeing this issue. The reason why I thought your first approach “making lock code watch it has created” could be more protective is this could protect us from any inconsistencies between original lock node and local lock state.
>   
> >  It would be nice to generate a test/simulation for this case so that it can be properly dealt with.
>  
> Created a test case simulator https://github.com/ViswaNXplore/curator/blob/curator-2.5.0/patch/double_locking_test_simulator/curator-recipes/src/test/java/org/apache/curator/framework/recipes/CuratorRecipeTest.java <https://github.com/ViswaNXplore/curator/blob/curator-2.5.0/patch/double_locking_test_simulator/curator-recipes/src/test/java/org/apache/curator/framework/recipes/CuratorRecipeTest.java>
>  
> Approach #1 (Artificial way of reproducing the issue)
> Run the test case with (SIMULATE_ISSUE_BY_EXPLICIT_DELETE set to true). This would delete the lock node after a pause just to simulate the state where the zkClient had reconnected and it still happens to see the ephermal node just before the server deletes it since its session has expired, but the node is deleted afterwards by the server.
> Approach #2 (A little manual interruption needed to reproduce the issue)
> Run the test case in Debug mode (SIMULATE_ISSUE_BY_EXPLICIT_DELETE set to false)
> Artificially delaying / pausing the ephemeral lock nodes deletion as part of session cleanup process in server code (ZookeeperServer.close() method)
> After a pause (say 5s) to make one of the instance to acquire the lock, Artificially break the socket connection between client and server for 30s (by keeping breakpoint in ClientCnxnSocketNIO.doIO() method). After 30s, we would see session closing logs logged in server code
> After 1min, remove breakpoint in ClientCnxnSocketNIO.doIO() and resume both Thread 2 and Thread 3
> After that, resume server thread (thread name would be “SessionTracker”
> Below Proposals discussed so far (3rd added now for your review)
> Making lock code watch its lock node it has created
> Session ID changes after wait exits it suggests the lock node should no longer be trusted and it should be deleted and re-created
> Leader Latch code is well protected to cover this zookeeper race condition, because Leader Latch code internally handle the connection events (which they use to interrupt latch_acquire_state to reset its latch every time connection is reconnected), means it will explicitly release the latch and recreate new if there is a connection disconnect (may be this can be the approach that lock recipe could use to protect ?) 
> Many Thanks,
> Viswa.
>  
> From: Jordan Zimmerman <jo...@jordanzimmerman.com>
> Date: Tuesday, 20 July 2021 at 21:11
> To: Viswanathan Rajagopal <vi...@workday.com>
> Cc: Sean Gibbons <se...@workday.com>, dev@curator.apache.org <de...@curator.apache.org>, cammckenzie@apache.org <ca...@apache.org>, user@curator.apache.org <us...@curator.apache.org>, Donal Arundel <do...@workday.com>, Francisco Goncalves <fr...@workday.com>, Zak Szadowski <za...@workday.com>, Dandie Beloy <da...@workday.com>, Marcelo Cenerino <ma...@workday.com>
> Subject: Re: [External Sender] Double Locking Issue
> 
> In our case, it wouldn’t throw any exception because it had gone past “creating lock nodes” and was blocked on wait(), which would only then be interrupted when curator watcher notified on previous sequence node delete event.
>  
> So, you're using the version of acquire() without a timeout? In any event, this is a problem. When you receive SUSPENDED you really should interrupt any threads that are waiting on Curator. The Curator docs imply this even though it might not be obvious. This is likely the source of your problems. A simple solution is to use the version of acquire that has a timeout and repeatedly call it until success (though your problem may still occur in this case). Maybe we could improve the lock recipe (or something similar) so that locks inside of acquire are interrupted on a network partition. 
>  
> [snip of your remaining thoughtful analysis]
>  
> I think there's generally a hole in Curator's lock recipe. The lock code does not watch the lock node it has created. So, another process or (as you found) a race with the server might cause the lock node to disappear underneath the lock instance after it thinks it has the lock. 
>  
> One thing we can do is to check the session ID before waiting in LockInternals.internalLockLoop(). If the session ID changes after wait exits it suggests the lock node should no longer be trusted and it should be deleted and re-created. That's 1 though anyway. It would be nice to generate a test/simulation for this case so that it can be properly dealt with.
>  
> -Jordan
> 
> 
> On Jul 20, 2021, at 3:53 PM, Viswanathan Rajagopal <viswanathan.rajagopa@workday.com <ma...@workday.com>> wrote:
>  
> Thanks Jordan for coming back on this.
>  
> Please find my inline comments. Also provided below few additional version info that we are using,
> Curator version – 2.5.0
> Zookeeper version – 3.5.3
>  
> >  I don't see how this is the case. If the client has received a network partition it shouldn't not consider any locks as being currently held (see the Tech Notes I mentioned in my previous email). If there is a partition during a call to acquire(), acquire() would throw an exception (once the retry policy has expired). BTW, what retry policy are you using?
>  
> In our case, it wouldn’t throw any exception because it had gone past “creating lock nodes” and was blocked on wait(), which would only then be interrupted when curator watcher notified on previous sequence node delete event.
>  
> > This isn't correct. Curator watches the previous node but lock acquisition is always based on calling ZK's getChildren, sorting them, and seeing if the lock's node is the first node in the sequence. If Ephemeral nodes aren't cleaned up it wouldn't be a problem. Any phantom ephemeral nodes would sort first and prevent Curator from believing it holds the lock.
>  
> Totally agreed on your point on lock acquisition. The problem that I was trying to explain is that the network partition occurs while the client is waiting for the lock (in the acquire()) but after the acquire code block has already written out the lock node successfully, so the client code will be blocking in the acquire and not get a connection exception. Once the previous lock node is deleted but not the lock node for this client then this client will leave the acquire thinking it has the lock but actually at this point the ZK server has expired the session and is in the process of deleting our clients lock node.
>  
> Trying to categorize the timeline events below to see if that would help us for better understanding,
> [ CONNECTED ] :
> Client A & Client B calls acquire creating Node N1 & N2 respectively
> Client A acquire() -> holds the lock as its Node N1 is first node in sequence
> Client B acquire() -> created Node N2, watches for previous sequence node (blocked on wait())
>  
> [ SUSPENDED CONNECTION OCCURS ] :
> Network partition happens
> Curator fires SUSPENDED
> Client A, on receiving SUSPENDED event, attempts to release the lock
> Client B, on receiving SUSPENDED event, does nothing as it was not holding any lock (as blocked on acquire() -> wait() )  
>  
> [ SESSION EXPIRED ] :
> Client A attempts to release the lock (with retries)
> Client B does nothing as it was not holding any lock (and is still blocked on acquire() -> wait() )
> Server prepared to cleanup previous client session and deleting lock nodes created as part of previous client session
>  
> [ RECONNECTED ] :
> Client A and Client B reconnected to another server with new session id (Note : This happens before server managed to cleanup / delete ephemeral nodes of previous client session)
> Client A released the lock successfully (means it would delete its lock node N1) and attempts to acquire lock by creating lock node N3 and watches for previous sequence node (N2)
> Client B who was blocked on acquire() -> wait() would be notified with previous sequence node (N1) deletion -> getChildren, sorting them, and seeing if the lock's node is the first node in the sequence. So, Client B sees its lock node N2 still (which I call it as about to be deleted node by server) and thereby acquires the lock
>  
> [ AFTER FEW SECONDS ] :
> Server managed to delete the ephemeral node N2 as part of previous client session cleanup
> Client A who was blocked on acquire() -> wait() would be notified with previous sequence node (N2) deletion -> getChildren, sorting them, and seeing if the lock's node is the first node in the sequence and thereby acquires the lock
> Client B – its local lock thread data went stale (as its lock path N2 not has been deleted by Server)
>  
> >  SUSPENDED in Curator means only that the connect has been lost, not that the session has ended.
> LOST is the state that means the session has ended
> Be aware of how GCs can affect Curator. See the Tech Note here: https://cwiki.apache.org/confluence/display/CURATOR/TN10 <https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CURATOR_TN10&d=DwMFaQ&c=DS6PUFBBr_KiLo7Sjt3ljp5jaW5k2i9ijVXllEdOozc&r=mwSLlPO0Vtstmu1dce0TMFqf5lUxD2SPdNdc1k4NXjU&m=jQttOc88vMYbwNBFzWUO8TKGSmu_8P6EDmcD7jlrVys&s=BJywW80sEBAxOQEuTDkeP8tx0XX0s01sz44lNyWXIXs&e=>
> <https://cwiki.apache.org/confluence/display/CURATOR/TN10 <https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CURATOR_TN10&d=DwMFaQ&c=DS6PUFBBr_KiLo7Sjt3ljp5jaW5k2i9ijVXllEdOozc&r=mwSLlPO0Vtstmu1dce0TMFqf5lUxD2SPdNdc1k4NXjU&m=jQttOc88vMYbwNBFzWUO8TKGSmu_8P6EDmcD7jlrVys&s=BJywW80sEBAxOQEuTDkeP8tx0XX0s01sz44lNyWXIXs&e=>>
> Also read this Tech Note on session handling: https://cwiki.apache.org/confluence/display/CURATOR/TN14 <https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CURATOR_TN14&d=DwMFaQ&c=DS6PUFBBr_KiLo7Sjt3ljp5jaW5k2i9ijVXllEdOozc&r=mwSLlPO0Vtstmu1dce0TMFqf5lUxD2SPdNdc1k4NXjU&m=jQttOc88vMYbwNBFzWUO8TKGSmu_8P6EDmcD7jlrVys&s=UhQP5TNa_OMaYVSJX3b9p7XqWZ0fZtErLQwwbcQUUQA&e=>
> <https://cwiki.apache.org/confluence/display/CURATOR/TN14 <https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CURATOR_TN14&d=DwMFaQ&c=DS6PUFBBr_KiLo7Sjt3ljp5jaW5k2i9ijVXllEdOozc&r=mwSLlPO0Vtstmu1dce0TMFqf5lUxD2SPdNdc1k4NXjU&m=jQttOc88vMYbwNBFzWUO8TKGSmu_8P6EDmcD7jlrVys&s=UhQP5TNa_OMaYVSJX3b9p7XqWZ0fZtErLQwwbcQUUQA&e=>>
>  
> Very good information on the tech notes. Thanks for that. Agreed, it’s always recommended to clear the locks when it sees SUSPENDED event. And we are already following your recommendation. Our application would clear the lock when it sees SUSPENDED event.
>  
> To summarize my thoughts,
> Ephemeral node created by previous session was  still seen by client that reconnected with new session id until server cleans that up the previous session ephemeral node. This could happen if client manage to reconnect with server with new session id before server cleans up the previous session. How it affects the curator lock recipe ? Explanation : The above explained race condition would make acquire() to hold the lock ( as its own lock node still seen ), eventually leading to inconsistent state (i.e. curator local lock state stale) when that lock node is being cleaned up by the server as part of previous session cleanup activities.
> I am NOT seeing a Curator bug here, but looking out for suggestions / recommendations in handling this zookeeper race condition. Either can a feature be added to Curator to cover this case / any recommendations that clients should follow.
>  
> Many Thanks,
> Viswa.
>  
> From: Sean Gibbons <sean.gibbons@workday.com <ma...@workday.com>>
> Date: Tuesday, 20 July 2021 at 12:19
> To: Viswanathan Rajagopal <viswanathan.rajagopa@workday.com <ma...@workday.com>>
> Subject: FW: [External Sender] Re: Double Locking Issue
> 
> 
> 
> On 20/07/2021, 10:12, "Jordan Zimmerman" <jordan@jordanzimmerman.com <ma...@jordanzimmerman.com>> wrote:
> 
>     A few more things...
> 
>     > Based on our initial analysis and few test runs, we saw that Curator acquire() method acquires the lock based on “about to be deleted lock node of previous session”. Explanation : Ephemeral node created by previous session was  still seen by client that reconnected with new session id until server cleans that up. If this happens, Curator acquire() would hold the lock.
> 
>     This isn't correct. Curator watches the previous node but lock acquisition is always based on calling ZK's getChildren, sorting them, and seeing if the lock's node is the first node in the sequence. If Ephemeral nodes aren't cleaned up it wouldn't be a problem. Any phantom ephemeral nodes would sort first and prevent Curator from believing it holds the lock.
> 
>     >  *   On the above mentioned race condition, if client manage to reconnect to server with new session id before server cleans up the ephemeral nodes of client’s previous session,  Curator lock acquire() who is trying to acquire the lock will hold the lock as it still sees the lock node in zookeeper directory. Eventually server would be cleaning up the ephemeral nodes leaving the Curator local lock thread data stale giving the illusion that it still hold the lock while its ephemeral node is gone
> 
>     I don't see how this is the case. If the client has received a network partition it shouldn't not consider any locks as being currently held (see the Tech Notes I mentioned in my previous email). If there is a partition during a call to acquire(), acquire() would throw an exception (once the retry policy has expired). BTW, what retry policy are you using?
> 
>     So, to reiterate, I don't see how phantom/undeleted ephemeral nodes would cause a problem. The only problem it could case is that a given Curator client takes longer to acquire a lock as it waits for those ephemerals to finally get deleted.
> 
>     -JZ
> 
>     > On Jul 19, 2021, at 6:45 PM, Viswanathan Rajagopal <viswanathan.rajagopa@workday.com.INVALID <ma...@workday.com.INVALID>> wrote:
>     > 
>     > Hi Team,
>     > 
>     > Good day.
>     > 
>     > Recently came across “Double Locking Issue (i.e. two clients acquiring lock)” using Curator code ( InterProcessMutex lock APIs ) in our application
>     > 
>     > Our use case:
>     > 
>     >  *   Two clients attempts to acquire the zookeeper lock using Curator InterProcessMutex and whoever owns it would release it once sees the connection disconnect ( on receiving Connection.SUSPENDED / Connection.LOST Curator Connection Events from Connection listener)
>     > 
>     > Issue we noticed:
>     > 
>     >  *   After session expired & reconnected with new session, both client seems to have acquired the lock. Interesting thing that we found is that one of the clients still holds the lock while its lock node (ephemeral) was gone
>     > 
>     > Things we found:
>     > 
>     >  *   Based on our initial analysis and few test runs, we saw that Curator acquire() method acquires the lock based on “about to be deleted lock node of previous session”. Explanation : Ephemeral node created by previous session was  still seen by client that reconnected with new session id until server cleans that up. If this happens, Curator acquire() would hold the lock.
>     > 
>     > 
>     > 
>     >  *   Clearly we could see the race condition (in zookeeper code) between 1). Client reconnecting to server with new session id and 2). server deleting the ephemeral nodes of client’s previous session. We were able to reproduce this issue using the following approach,
>     >     *   Artificially break the socket connection between client and server for 30s
>     >     *   Artificially pausing the set of server codes for a min and unpause
>     > 
>     > 
>     >  *   On the above mentioned race condition, if client manage to reconnect to server with new session id before server cleans up the ephemeral nodes of client’s previous session,  Curator lock acquire() who is trying to acquire the lock will hold the lock as it still sees the lock node in zookeeper directory. Eventually server would be cleaning up the ephemeral nodes leaving the Curator local lock thread data stale giving the illusion that it still hold the lock while its ephemeral node is gone
>     > 
>     > 
>     >  *   Timeline events described below for better understanding,
>     >     *   At t1, Client A and Client B establishes zookeeper session with session id A1 and B1 respectively
>     >     *   At t2, Client A creates the lock node N1 & acquires the lock
>     >     *   At t3, Client B creates the lock node N2 & blocked on acquire() to acquire the lock
>     >     *   At t4, session timed out for both clients & server is about to clean up the old session • Client A trying to release the lock
>     >     *   At t5, Client A and Client B reconnects to server with new session id A2 and B2 respectively before server deletes the ephemeral node N1 & N2 of previous client session. Client A releases the lock, deleting N1 and trying to acquire it again by creating N3 node and Client B who is blocked on acquire() acquires the lock based on N2 (about to be deleted node created by previous session)
>     >     *   At  t6, server cleans up the ephemeral node N2 created by Client B’s previous session. Client A acquires the lock with node N3 as its previous sequence N2 gets deleted, whereas Client B who incorrectly acquired the lock at t5 timeline still holds the lock
>     > 
>     > 
>     > Note:
>     > 
>     >  *   We are not certain that this race condition that we noticed in zookeeper code is intentional design.
>     > 
>     > Questions:
>     > 
>     >  *   Given this race condition seen in zookeeper code, we would like to hear your recommendations / suggestions to avoid this issue while using Curator lock code?
>     > 
>     > 
>     > 
>     >  *   We also see that Interprocess Mutex has the makeRevocable() API that enables application to revoke lock, but that handles Node change event only but not for Node deleted event. I understand that this makes sense to handle Node change event alone, as to enable application user to revoke lock externally from application side. But would it be also to okay to have one for Node delete event, so as application can register the listener for Node delete event. I would like to hear your thoughts.
>     > 
>     > 
>     > Many Thanks,
>     > Viswa
>

Re: [External Sender] Double Locking Issue

Posted by Viswanathan Rajagopal <vi...@workday.com.INVALID>.

Hi Jordan,

Thanks for the suggestions. Much helpful.

Quick summary of my below comments,

  *   Added test case simulator to reproduce issue (though artificially)
  *   Added a proposed code just to get your review and suggestions to see whether that would work
Please find my detailed comments inline,

> I think there's generally a hole in Curator's lock recipe. The lock code does not watch the lock node it has created. So, another process or (as you found) a race with the server might cause the lock node to disappear underneath the lock instance after it thinks it has the lock. One thing we can do is to check the session ID before waiting in LockInternals.internalLockLoop(). If the session ID changes after wait exits it suggests the lock node should no longer be trusted and it should be deleted and re-created. That's 1 though anyway.

Yes, you are right. Currently lock code doesn’t watch the lock node it has created. That’s where we would like to hear your feedback on our proposal mentioned in our first mail thread
“Curator lock code has makeRevocable() API that enables application to revoke lock anytime from application by triggering NODE_CHANGE event through Revoker.attemptToRevoke() utility.
Proposal: Would it be nice to extend makeRevocable() API to handle Node delete event, which would allow application to register the watcher for Node delete event, thereby application can react to Node delete event by revoking the lock ?”. Tried the Code snippet (https://github.com/ViswaNXplore/curator/blob/curator-2.5.0/patch/double_locking_test_simulator/curator-recipes/src/main/java/org/apache/curator/framework/recipes/locks/LockInternals.java#L60
) to see the approach “Making lock code watch its lock node it has created” works and would like to get your thoughts on this and do you see any other impacts there? May need your advice.

I agree with you on that the session id approach, that would protect us from seeing this issue. The reason why I thought your first approach “making lock code watch it has created” could be more protective is this could protect us from any inconsistencies between original lock node and local lock state.

>  It would be nice to generate a test/simulation for this case so that it can be properly dealt with.

Created a test case simulator https://github.com/ViswaNXplore/curator/blob/curator-2.5.0/patch/double_locking_test_simulator/curator-recipes/src/test/java/org/apache/curator/framework/recipes/CuratorRecipeTest.java

Approach #1 (Artificial way of reproducing the issue)

  1.  Run the test case with (SIMULATE_ISSUE_BY_EXPLICIT_DELETE set to true). This would delete the lock node after a pause just to simulate the state where the zkClient had reconnected and it still happens to see the ephermal node just before the server deletes it since its session has expired, but the node is deleted afterwards by the server.
Approach #2 (A little manual interruption needed to reproduce the issue)

  1.  Run the test case in Debug mode (SIMULATE_ISSUE_BY_EXPLICIT_DELETE set to false)
  2.  Artificially delaying / pausing the ephemeral lock nodes deletion as part of session cleanup process in server code (ZookeeperServer.close() method)
  3.  After a pause (say 5s) to make one of the instance to acquire the lock, Artificially break the socket connection between client and server for 30s (by keeping breakpoint in ClientCnxnSocketNIO.doIO() method). After 30s, we would see session closing logs logged in server code
  4.  After 1min, remove breakpoint in ClientCnxnSocketNIO.doIO() and resume both Thread 2 and Thread 3
  5.  After that, resume server thread (thread name would be “SessionTracker”
Below Proposals discussed so far (3rd added now for your review)

  1.  Making lock code watch its lock node it has created
  2.  Session ID changes after wait exits it suggests the lock node should no longer be trusted and it should be deleted and re-created
  3.  Leader Latch code is well protected to cover this zookeeper race condition, because Leader Latch code internally handle the connection events (which they use to interrupt latch_acquire_state to reset its latch every time connection is reconnected), means it will explicitly release the latch and recreate new if there is a connection disconnect (may be this can be the approach that lock recipe could use to protect ?)
Many Thanks,
Viswa.

From: Jordan Zimmerman <jo...@jordanzimmerman.com>
Date: Tuesday, 20 July 2021 at 21:11
To: Viswanathan Rajagopal <vi...@workday.com>
Cc: Sean Gibbons <se...@workday.com>, dev@curator.apache.org <de...@curator.apache.org>, cammckenzie@apache.org <ca...@apache.org>, user@curator.apache.org <us...@curator.apache.org>, Donal Arundel <do...@workday.com>, Francisco Goncalves <fr...@workday.com>, Zak Szadowski <za...@workday.com>, Dandie Beloy <da...@workday.com>, Marcelo Cenerino <ma...@workday.com>
Subject: Re: [External Sender] Double Locking Issue
In our case, it wouldn’t throw any exception because it had gone past “creating lock nodes” and was blocked on wait(), which would only then be interrupted when curator watcher notified on previous sequence node delete event.

So, you're using the version of acquire() without a timeout? In any event, this is a problem. When you receive SUSPENDED you really should interrupt any threads that are waiting on Curator. The Curator docs imply this even though it might not be obvious. This is likely the source of your problems. A simple solution is to use the version of acquire that has a timeout and repeatedly call it until success (though your problem may still occur in this case). Maybe we could improve the lock recipe (or something similar) so that locks inside of acquire are interrupted on a network partition.

[snip of your remaining thoughtful analysis]

I think there's generally a hole in Curator's lock recipe. The lock code does not watch the lock node it has created. So, another process or (as you found) a race with the server might cause the lock node to disappear underneath the lock instance after it thinks it has the lock.

One thing we can do is to check the session ID before waiting in LockInternals.internalLockLoop(). If the session ID changes after wait exits it suggests the lock node should no longer be trusted and it should be deleted and re-created. That's 1 though anyway. It would be nice to generate a test/simulation for this case so that it can be properly dealt with.

-Jordan

On Jul 20, 2021, at 3:53 PM, Viswanathan Rajagopal <vi...@workday.com>> wrote:

Thanks Jordan for coming back on this.

Please find my inline comments. Also provided below few additional version info that we are using,

  *   Curator version – 2.5.0
  *   Zookeeper version – 3.5.3

>  I don't see how this is the case. If the client has received a network partition it shouldn't not consider any locks as being currently held (see the Tech Notes I mentioned in my previous email). If there is a partition during a call to acquire(), acquire() would throw an exception (once the retry policy has expired). BTW, what retry policy are you using?

In our case, it wouldn’t throw any exception because it had gone past “creating lock nodes” and was blocked on wait(), which would only then be interrupted when curator watcher notified on previous sequence node delete event.

> This isn't correct. Curator watches the previous node but lock acquisition is always based on calling ZK's getChildren, sorting them, and seeing if the lock's node is the first node in the sequence. If Ephemeral nodes aren't cleaned up it wouldn't be a problem. Any phantom ephemeral nodes would sort first and prevent Curator from believing it holds the lock.

Totally agreed on your point on lock acquisition. The problem that I was trying to explain is that the network partition occurs while the client is waiting for the lock (in the acquire()) but after the acquire code block has already written out the lock node successfully, so the client code will be blocking in the acquire and not get a connection exception. Once the previous lock node is deleted but not the lock node for this client then this client will leave the acquire thinking it has the lock but actually at this point the ZK server has expired the session and is in the process of deleting our clients lock node.

Trying to categorize the timeline events below to see if that would help us for better understanding,
[ CONNECTED ] :

  *   Client A & Client B calls acquire creating Node N1 & N2 respectively
  *   Client A acquire() -> holds the lock as its Node N1 is first node in sequence
  *   Client B acquire() -> created Node N2, watches for previous sequence node (blocked on wait())

[ SUSPENDED CONNECTION OCCURS ] :

  *   Network partition happens
  *   Curator fires SUSPENDED
  *   Client A, on receiving SUSPENDED event, attempts to release the lock
  *   Client B, on receiving SUSPENDED event, does nothing as it was not holding any lock (as blocked on acquire() -> wait() )

[ SESSION EXPIRED ] :

  *   Client A attempts to release the lock (with retries)
  *   Client B does nothing as it was not holding any lock (and is still blocked on acquire() -> wait() )
  *   Server prepared to cleanup previous client session and deleting lock nodes created as part of previous client session

[ RECONNECTED ] :

  *   Client A and Client B reconnected to another server with new session id (Note : This happens before server managed to cleanup / delete ephemeral nodes of previous client session)
  *   Client A released the lock successfully (means it would delete its lock node N1) and attempts to acquire lock by creating lock node N3 and watches for previous sequence node (N2)
  *   Client B who was blocked on acquire() -> wait() would be notified with previous sequence node (N1) deletion -> getChildren, sorting them, and seeing if the lock's node is the first node in the sequence. So, Client B sees its lock node N2 still (which I call it as about to be deleted node by server) and thereby acquires the lock

[ AFTER FEW SECONDS ] :

  *   Server managed to delete the ephemeral node N2 as part of previous client session cleanup
  *   Client A who was blocked on acquire() -> wait() would be notified with previous sequence node (N2) deletion -> getChildren, sorting them, and seeing if the lock's node is the first node in the sequence and thereby acquires the lock
  *   Client B – its local lock thread data went stale (as its lock path N2 not has been deleted by Server)

>  SUSPENDED in Curator means only that the connect has been lost, not that the session has ended.
LOST is the state that means the session has ended
Be aware of how GCs can affect Curator. See the Tech Note here: https://cwiki.apache.org/confluence/display/CURATOR/TN10<https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CURATOR_TN10&d=DwMFaQ&c=DS6PUFBBr_KiLo7Sjt3ljp5jaW5k2i9ijVXllEdOozc&r=mwSLlPO0Vtstmu1dce0TMFqf5lUxD2SPdNdc1k4NXjU&m=jQttOc88vMYbwNBFzWUO8TKGSmu_8P6EDmcD7jlrVys&s=BJywW80sEBAxOQEuTDkeP8tx0XX0s01sz44lNyWXIXs&e=>
<https://cwiki.apache.org/confluence/display/CURATOR/TN10<https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CURATOR_TN10&d=DwMFaQ&c=DS6PUFBBr_KiLo7Sjt3ljp5jaW5k2i9ijVXllEdOozc&r=mwSLlPO0Vtstmu1dce0TMFqf5lUxD2SPdNdc1k4NXjU&m=jQttOc88vMYbwNBFzWUO8TKGSmu_8P6EDmcD7jlrVys&s=BJywW80sEBAxOQEuTDkeP8tx0XX0s01sz44lNyWXIXs&e=>>
Also read this Tech Note on session handling: https://cwiki.apache.org/confluence/display/CURATOR/TN14<https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CURATOR_TN14&d=DwMFaQ&c=DS6PUFBBr_KiLo7Sjt3ljp5jaW5k2i9ijVXllEdOozc&r=mwSLlPO0Vtstmu1dce0TMFqf5lUxD2SPdNdc1k4NXjU&m=jQttOc88vMYbwNBFzWUO8TKGSmu_8P6EDmcD7jlrVys&s=UhQP5TNa_OMaYVSJX3b9p7XqWZ0fZtErLQwwbcQUUQA&e=>
<https://cwiki.apache.org/confluence/display/CURATOR/TN14<https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CURATOR_TN14&d=DwMFaQ&c=DS6PUFBBr_KiLo7Sjt3ljp5jaW5k2i9ijVXllEdOozc&r=mwSLlPO0Vtstmu1dce0TMFqf5lUxD2SPdNdc1k4NXjU&m=jQttOc88vMYbwNBFzWUO8TKGSmu_8P6EDmcD7jlrVys&s=UhQP5TNa_OMaYVSJX3b9p7XqWZ0fZtErLQwwbcQUUQA&e=>>

Very good information on the tech notes. Thanks for that. Agreed, it’s always recommended to clear the locks when it sees SUSPENDED event. And we are already following your recommendation. Our application would clear the lock when it sees SUSPENDED event.

To summarize my thoughts,

  *   Ephemeral node created by previous session was  still seen by client that reconnected with new session id until server cleans that up the previous session ephemeral node. This could happen if client manage to reconnect with server with new session id before server cleans up the previous session. How it affects the curator lock recipe ? Explanation : The above explained race condition would make acquire() to hold the lock ( as its own lock node still seen ), eventually leading to inconsistent state (i.e. curator local lock state stale) when that lock node is being cleaned up by the server as part of previous session cleanup activities.
  *   I am NOT seeing a Curator bug here, but looking out for suggestions / recommendations in handling this zookeeper race condition. Either can a feature be added to Curator to cover this case / any recommendations that clients should follow.

Many Thanks,
Viswa.

From: Sean Gibbons <se...@workday.com>>
Date: Tuesday, 20 July 2021 at 12:19
To: Viswanathan Rajagopal <vi...@workday.com>>
Subject: FW: [External Sender] Re: Double Locking Issue

On 20/07/2021, 10:12, "Jordan Zimmerman" <jo...@jordanzimmerman.com>> wrote:

    A few more things...

    > Based on our initial analysis and few test runs, we saw that Curator acquire() method acquires the lock based on “about to be deleted lock node of previous session”. Explanation : Ephemeral node created by previous session was  still seen by client that reconnected with new session id until server cleans that up. If this happens, Curator acquire() would hold the lock.

    This isn't correct. Curator watches the previous node but lock acquisition is always based on calling ZK's getChildren, sorting them, and seeing if the lock's node is the first node in the sequence. If Ephemeral nodes aren't cleaned up it wouldn't be a problem. Any phantom ephemeral nodes would sort first and prevent Curator from believing it holds the lock.

    >  *   On the above mentioned race condition, if client manage to reconnect to server with new session id before server cleans up the ephemeral nodes of client’s previous session,  Curator lock acquire() who is trying to acquire the lock will hold the lock as it still sees the lock node in zookeeper directory. Eventually server would be cleaning up the ephemeral nodes leaving the Curator local lock thread data stale giving the illusion that it still hold the lock while its ephemeral node is gone

    I don't see how this is the case. If the client has received a network partition it shouldn't not consider any locks as being currently held (see the Tech Notes I mentioned in my previous email). If there is a partition during a call to acquire(), acquire() would throw an exception (once the retry policy has expired). BTW, what retry policy are you using?

    So, to reiterate, I don't see how phantom/undeleted ephemeral nodes would cause a problem. The only problem it could case is that a given Curator client takes longer to acquire a lock as it waits for those ephemerals to finally get deleted.

    -JZ

    > On Jul 19, 2021, at 6:45 PM, Viswanathan Rajagopal <vi...@workday.com.INVALID>> wrote:
    >
    > Hi Team,
    >
    > Good day.
    >
    > Recently came across “Double Locking Issue (i.e. two clients acquiring lock)” using Curator code ( InterProcessMutex lock APIs ) in our application
    >
    > Our use case:
    >
    >  *   Two clients attempts to acquire the zookeeper lock using Curator InterProcessMutex and whoever owns it would release it once sees the connection disconnect ( on receiving Connection.SUSPENDED / Connection.LOST Curator Connection Events from Connection listener)
    >
    > Issue we noticed:
    >
    >  *   After session expired & reconnected with new session, both client seems to have acquired the lock. Interesting thing that we found is that one of the clients still holds the lock while its lock node (ephemeral) was gone
    >
    > Things we found:
    >
    >  *   Based on our initial analysis and few test runs, we saw that Curator acquire() method acquires the lock based on “about to be deleted lock node of previous session”. Explanation : Ephemeral node created by previous session was  still seen by client that reconnected with new session id until server cleans that up. If this happens, Curator acquire() would hold the lock.
    >
    >
    >
    >  *   Clearly we could see the race condition (in zookeeper code) between 1). Client reconnecting to server with new session id and 2). server deleting the ephemeral nodes of client’s previous session. We were able to reproduce this issue using the following approach,
    >     *   Artificially break the socket connection between client and server for 30s
    >     *   Artificially pausing the set of server codes for a min and unpause
    >
    >
    >  *   On the above mentioned race condition, if client manage to reconnect to server with new session id before server cleans up the ephemeral nodes of client’s previous session,  Curator lock acquire() who is trying to acquire the lock will hold the lock as it still sees the lock node in zookeeper directory. Eventually server would be cleaning up the ephemeral nodes leaving the Curator local lock thread data stale giving the illusion that it still hold the lock while its ephemeral node is gone
    >
    >
    >  *   Timeline events described below for better understanding,
    >     *   At t1, Client A and Client B establishes zookeeper session with session id A1 and B1 respectively
    >     *   At t2, Client A creates the lock node N1 & acquires the lock
    >     *   At t3, Client B creates the lock node N2 & blocked on acquire() to acquire the lock
    >     *   At t4, session timed out for both clients & server is about to clean up the old session • Client A trying to release the lock
    >     *   At t5, Client A and Client B reconnects to server with new session id A2 and B2 respectively before server deletes the ephemeral node N1 & N2 of previous client session. Client A releases the lock, deleting N1 and trying to acquire it again by creating N3 node and Client B who is blocked on acquire() acquires the lock based on N2 (about to be deleted node created by previous session)
    >     *   At  t6, server cleans up the ephemeral node N2 created by Client B’s previous session. Client A acquires the lock with node N3 as its previous sequence N2 gets deleted, whereas Client B who incorrectly acquired the lock at t5 timeline still holds the lock
    >
    >
    > Note:
    >
    >  *   We are not certain that this race condition that we noticed in zookeeper code is intentional design.
    >
    > Questions:
    >
    >  *   Given this race condition seen in zookeeper code, we would like to hear your recommendations / suggestions to avoid this issue while using Curator lock code?
    >
    >
    >
    >  *   We also see that Interprocess Mutex has the makeRevocable() API that enables application to revoke lock, but that handles Node change event only but not for Node deleted event. I understand that this makes sense to handle Node change event alone, as to enable application user to revoke lock externally from application side. But would it be also to okay to have one for Node delete event, so as application can register the listener for Node delete event. I would like to hear your thoughts.
    >
    >
    > Many Thanks,
    > Viswa