You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@curator.apache.org by Robert Kamphuis <Ro...@supercell.com> on 2014/03/19 11:18:51 UTC

Confused about the LeaderLatch - what should happen on ConnectionState.SUSPENDED and ConnectionState.LOST ?

Hi,

I have been working on changing our application to work with Zookeeper and Curator for some while now, and are occasionally getting wrong behaviour out of my system.
The symptom I’m getting is that two servers are concluding that they are the leader of a particular task/leaderlatch at the same time, braking everything in my application.
This does not happen too often - but often enough and it is bad enough for my application. I can get it pretty consistently occurring by restarting one of the servers in our 5-server zookeeper ensembles in turns,
while having multiple servers queuing up for the same leader latch.

My key question is the following:
- WHAT should a user of a leaderLatch do when the connectionState goes to suspended?

My assumption and desired behaviour is that the user should suspend operations - which implies to me that its leadership status is uncertain. (I am holding off all persistent operations for example).
But -I think- this also implies that no-one else can become leader yet - we either have the old-leader still be leader, and no one else, or then the old-leader disappeared and we are in effect leaderless for some time.
This will then be followed by
a) a reconnect - in which case the old leader can continue its stuff (and optionally double check its leadership status) or
b) a lost - in which case the old leader lost its leadership and should release all its power etc and try again or do something else. Someone else likely became leader in my application by then.
The a) or b) is controlled by the SessionTimeout negotiated between the curator/zookeeper client and zookeeper ensemble.

Is my thinking correct here?
and if so, why is the curator’s LeaderLatch.handleStateChange(ConnectionState newState) handling both in the same way : setLeadership(false)

In my application, a leadership change is a pretty big event, due to the amount of work the code does, and I really want leadership to remain between short connection-breaks - eg. one of the zookeeper servers crashes. Leadership should only be swapped on a sessiontimeout - eg. broken application node, or long network break between the server and the zookeeper servers. I am thinking to use 90 second as session timeout (so to survive eg. longer GC breaks and similar without leadership change) - maybe even longer.

Is this a bug in leader latch, or should I use something else than leader latch, or implement my desired behaviour in a new recipe?

kind regards,
Robert Kamphuis

PS. using zookeeper3.4.5 and curator2.4.0


Re: Confused about the LeaderLatch - what should happen on ConnectionState.SUSPENDED and ConnectionState.LOST ?

Posted by Jordan Zimmerman <jo...@jordanzimmerman.com>.
Maybe you should consider PersistentEphemeralNode instead. The long term “leader” would try to allocate a PersistentEphemeralNode. If successful, he’s the leader. If not, set a watch on the node and try again when it triggers.

-JZ


From: Robert Kamphuis robert.kamphuis@supercell.com
Reply: Robert Kamphuis robert.kamphuis@supercell.com
Date: March 20, 2014 at 1:30:37 AM
To: Jordan Zimmerman jordan@jordanzimmerman.com
Cc: user@curator.apache.org user@curator.apache.org
Subject:  Re: Confused about the LeaderLatch - what should happen on ConnectionState.SUSPENDED and ConnectionState.LOST ?  


On 20 Mar, 2014, at 02:49 am, Jordan Zimmerman <jo...@jordanzimmerman.com> wrote:

Curator’s approach is safety first. While it might be possible to time a network outage against the session timeout, I believe that this is not good distributed design. What if another client has negotiated a different session timeout than yours? What if there is clock drift? So, Curator uses the most conservative method when there’s a network event.

I am setting things up with identical configurations using the same AWS images for my client-apps and for the zookeeper ensemble. Clock drift is kept in check with ntp. 
By using a long session time out - 90secs or more - I hope to survive a crashing zookeeper-server - without some 20% of my client servers loosing leadership and in effect shutting down and restarting their task election. I am gonna have a couple of hundred servers - loosing 20% is too big a hit for our application logic.  
I guess I am using the leader latch in a slightly different manner - for selecting a worker getting “elected for life” in stead of elected in parliament in some countries where there is a re-election a couple of times a year. 

Do you agree that there are use cases like mine where this election-for-life is the desired behaviour? 
I will be experimenting by building a “dictatorLatch” and see how that works out. 



That said, it might be possible to have a pluggable ConnectionStateListener strategy for Curator Recipes. Instead of each recipe assuming the worst when there is a network event, there could be something like a ConnectionStateListener wrapper that suppresses SUSPENDED until the session timeout elapses. I haven’t totally thought this through though.


From my experiments so far (copy-pasting the leader latch and modifying the behaviour on suspend) it looks that on reconnect the leader latch is apparently replacing its node in zookeeper, and when other servers where in the election during the suspend-period of the leader, the replacement node will be lower in the election-list - thus loosing the leadership. I will continue staring at this - need some more tracing to isolate what happens. 

thanks for your time! 

Robert 



-JZ


From: Robert Kamphuis robert.kamphuis@supercell.com
Reply: user@curator.apache.org user@curator.apache.org
Date: March 19, 2014 at 6:23:01 AM
To: user@curator.apache.org user@curator.apache.org
Cc: Robert Kamphuis robert.kamphuis@supercell.com
Subject:  Confused about the LeaderLatch - what should happen on ConnectionState.SUSPENDED and ConnectionState.LOST ? 


Hi, 

I have been working on changing our application to work with Zookeeper and Curator for some while now, and are occasionally getting wrong behaviour out of my system.
The symptom I’m getting is that two servers are concluding that they are the leader of a particular task/leaderlatch at the same time, braking everything in my application.
This does not happen too often - but often enough and it is bad enough for my application. I can get it pretty consistently occurring by restarting one of the servers in our 5-server zookeeper ensembles in turns, 
while having multiple servers queuing up for the same leader latch. 

My key question is the following:
- WHAT should a user of a leaderLatch do when the connectionState goes to suspended?

My assumption and desired behaviour is that the user should suspend operations - which implies to me that its leadership status is uncertain. (I am holding off all persistent operations for example). 
But -I think- this also implies that no-one else can become leader yet - we either have the old-leader still be leader, and no one else, or then the old-leader disappeared and we are in effect leaderless for some time.
This will then be followed by 
a) a reconnect - in which case the old leader can continue its stuff (and optionally double check its leadership status) or
b) a lost - in which case the old leader lost its leadership and should release all its power etc and try again or do something else. Someone else likely became leader in my application by then.
The a) or b) is controlled by the SessionTimeout negotiated between the curator/zookeeper client and zookeeper ensemble.

Is my thinking correct here?
and if so, why is the curator’s LeaderLatch.handleStateChange(ConnectionState newState) handling both in the same way : setLeadership(false)

In my application, a leadership change is a pretty big event, due to the amount of work the code does, and I really want leadership to remain between short connection-breaks - eg. one of the zookeeper servers crashes. Leadership should only be swapped on a sessiontimeout - eg. broken application node, or long network break between the server and the zookeeper servers. I am thinking to use 90 second as session timeout (so to survive eg. longer GC breaks and similar without leadership change) - maybe even longer.

Is this a bug in leader latch, or should I use something else than leader latch, or implement my desired behaviour in a new recipe?

kind regards,
Robert Kamphuis

PS. using zookeeper3.4.5 and curator2.4.0


Re: Confused about the LeaderLatch - what should happen on ConnectionState.SUSPENDED and ConnectionState.LOST ?

Posted by Robert Kamphuis <Ro...@supercell.com>.
On 20 Mar, 2014, at 02:49 am, Jordan Zimmerman <jo...@jordanzimmerman.com>> wrote:

Curator’s approach is safety first. While it might be possible to time a network outage against the session timeout, I believe that this is not good distributed design. What if another client has negotiated a different session timeout than yours? What if there is clock drift? So, Curator uses the most conservative method when there’s a network event.

I am setting things up with identical configurations using the same AWS images for my client-apps and for the zookeeper ensemble. Clock drift is kept in check with ntp.
By using a long session time out - 90secs or more - I hope to survive a crashing zookeeper-server - without some 20% of my client servers loosing leadership and in effect shutting down and restarting their task election. I am gonna have a couple of hundred servers - loosing 20% is too big a hit for our application logic.
I guess I am using the leader latch in a slightly different manner - for selecting a worker getting “elected for life” in stead of elected in parliament in some countries where there is a re-election a couple of times a year.

Do you agree that there are use cases like mine where this election-for-life is the desired behaviour?
I will be experimenting by building a “dictatorLatch” and see how that works out.



That said, it might be possible to have a pluggable ConnectionStateListener strategy for Curator Recipes. Instead of each recipe assuming the worst when there is a network event, there could be something like a ConnectionStateListener wrapper that suppresses SUSPENDED until the session timeout elapses. I haven’t totally thought this through though.


>From my experiments so far (copy-pasting the leader latch and modifying the behaviour on suspend) it looks that on reconnect the leader latch is apparently replacing its node in zookeeper, and when other servers where in the election during the suspend-period of the leader, the replacement node will be lower in the election-list - thus loosing the leadership. I will continue staring at this - need some more tracing to isolate what happens.

thanks for your time!

Robert



-JZ


From: Robert Kamphuis robert.kamphuis@supercell.com<ma...@supercell.com>
Reply: user@curator.apache.org<ma...@curator.apache.org> user@curator.apache.org<ma...@curator.apache.org>
Date: March 19, 2014 at 6:23:01 AM
To: user@curator.apache.org<ma...@curator.apache.org> user@curator.apache.org<ma...@curator.apache.org>
Cc: Robert Kamphuis robert.kamphuis@supercell.com<ma...@supercell.com>
Subject:  Confused about the LeaderLatch - what should happen on ConnectionState.SUSPENDED and ConnectionState.LOST ?


Hi,

I have been working on changing our application to work with Zookeeper and Curator for some while now, and are occasionally getting wrong behaviour out of my system.
The symptom I’m getting is that two servers are concluding that they are the leader of a particular task/leaderlatch at the same time, braking everything in my application.
This does not happen too often - but often enough and it is bad enough for my application. I can get it pretty consistently occurring by restarting one of the servers in our 5-server zookeeper ensembles in turns,
while having multiple servers queuing up for the same leader latch.

My key question is the following:
- WHAT should a user of a leaderLatch do when the connectionState goes to suspended?

My assumption and desired behaviour is that the user should suspend operations - which implies to me that its leadership status is uncertain. (I am holding off all persistent operations for example).
But -I think- this also implies that no-one else can become leader yet - we either have the old-leader still be leader, and no one else, or then the old-leader disappeared and we are in effect leaderless for some time.
This will then be followed by
a) a reconnect - in which case the old leader can continue its stuff (and optionally double check its leadership status) or
b) a lost - in which case the old leader lost its leadership and should release all its power etc and try again or do something else. Someone else likely became leader in my application by then.
The a) or b) is controlled by the SessionTimeout negotiated between the curator/zookeeper client and zookeeper ensemble.

Is my thinking correct here?
and if so, why is the curator’s LeaderLatch.handleStateChange(ConnectionState newState) handling both in the same way : setLeadership(false)

In my application, a leadership change is a pretty big event, due to the amount of work the code does, and I really want leadership to remain between short connection-breaks - eg. one of the zookeeper servers crashes. Leadership should only be swapped on a sessiontimeout - eg. broken application node, or long network break between the server and the zookeeper servers. I am thinking to use 90 second as session timeout (so to survive eg. longer GC breaks and similar without leadership change) - maybe even longer.

Is this a bug in leader latch, or should I use something else than leader latch, or implement my desired behaviour in a new recipe?

kind regards,
Robert Kamphuis

PS. using zookeeper3.4.5 and curator2.4.0


Re: Confused about the LeaderLatch - what should happen on ConnectionState.SUSPENDED and ConnectionState.LOST ?

Posted by Jordan Zimmerman <jo...@jordanzimmerman.com>.
Curator’s approach is safety first. While it might be possible to time a network outage against the session timeout, I believe that this is not good distributed design. What if another client has negotiated a different session timeout than yours? What if there is clock drift? So, Curator uses the most conservative method when there’s a network event.

That said, it might be possible to have a pluggable ConnectionStateListener strategy for Curator Recipes. Instead of each recipe assuming the worst when there is a network event, there could be something like a ConnectionStateListener wrapper that suppresses SUSPENDED until the session timeout elapses. I haven’t totally thought this through though.

-JZ


From: Robert Kamphuis robert.kamphuis@supercell.com
Reply: user@curator.apache.org user@curator.apache.org
Date: March 19, 2014 at 6:23:01 AM
To: user@curator.apache.org user@curator.apache.org
Cc: Robert Kamphuis robert.kamphuis@supercell.com
Subject:  Confused about the LeaderLatch - what should happen on ConnectionState.SUSPENDED and ConnectionState.LOST ?  


Hi, 

I have been working on changing our application to work with Zookeeper and Curator for some while now, and are occasionally getting wrong behaviour out of my system.
The symptom I’m getting is that two servers are concluding that they are the leader of a particular task/leaderlatch at the same time, braking everything in my application.
This does not happen too often - but often enough and it is bad enough for my application. I can get it pretty consistently occurring by restarting one of the servers in our 5-server zookeeper ensembles in turns, 
while having multiple servers queuing up for the same leader latch. 

My key question is the following:
- WHAT should a user of a leaderLatch do when the connectionState goes to suspended?

My assumption and desired behaviour is that the user should suspend operations - which implies to me that its leadership status is uncertain. (I am holding off all persistent operations for example). 
But -I think- this also implies that no-one else can become leader yet - we either have the old-leader still be leader, and no one else, or then the old-leader disappeared and we are in effect leaderless for some time.
This will then be followed by 
a) a reconnect - in which case the old leader can continue its stuff (and optionally double check its leadership status) or
b) a lost - in which case the old leader lost its leadership and should release all its power etc and try again or do something else. Someone else likely became leader in my application by then.
The a) or b) is controlled by the SessionTimeout negotiated between the curator/zookeeper client and zookeeper ensemble.

Is my thinking correct here?
and if so, why is the curator’s LeaderLatch.handleStateChange(ConnectionState newState) handling both in the same way : setLeadership(false)

In my application, a leadership change is a pretty big event, due to the amount of work the code does, and I really want leadership to remain between short connection-breaks - eg. one of the zookeeper servers crashes. Leadership should only be swapped on a sessiontimeout - eg. broken application node, or long network break between the server and the zookeeper servers. I am thinking to use 90 second as session timeout (so to survive eg. longer GC breaks and similar without leadership change) - maybe even longer.

Is this a bug in leader latch, or should I use something else than leader latch, or implement my desired behaviour in a new recipe?

kind regards,
Robert Kamphuis

PS. using zookeeper3.4.5 and curator2.4.0


Re: Confused about the LeaderLatch - what should happen on ConnectionState.SUSPENDED and ConnectionState.LOST ?

Posted by Robert Kamphuis <Ro...@supercell.com>.
I can see what you describe as what is happening, but this is not what was expecting.
I have had the assumption that the session-timeout is the key timeout value which will prevent other clients - which indeed would be connected to another host of the zookeeper ensemble - from grabbing leadership.
As apparently today happens, the old-leader is ready to give up leadership once it detects a suspend - but why is the zookeeper ensemble doing the same before the sessiontimeout?  I think that the zookeeper ensemble is not doing this until the session times out,  but the curators leader latch is apparently setting up the release of its leader latch node which it will push to zookeeper once it reconnects.
This leads to the confusing behaviour that the old leader remains the leader during 5-10 seconds it takes to reconnect to the zookeeper cluster (running across multiple AZ’s in AWS), but looses leadership soon after the reconnect.
(and of course I did not realise this until recently wondering what is going wrong :)

Do you think this is the behaviour people would expect? What are other users expecting from the leader latch? Could the documentation clarify this a bit better either way?

I will be experimenting a bit more with a more power-hungry “dictatorLatch” which hangs on to its leadership until the last moment - the session timeout.


On 19 Mar, 2014, at 15:00 pm, Matt Brown <Ma...@citrix.com>> wrote:

> My assumption and desired behaviour is that the user should suspend operations - which implies to me that its leadership status is uncertain. (I am holding off all persistent operations for example).
> But -I think- this also implies that no-one else can become leader yet - we either have the old-leader still be leader, and no one else, or then the old-leader disappeared and we are in effect leaderless for some time.

I think the second part of this is incorrect – if client 1 has lost it's zookeeper connection, it doesn't imply that other clients have also lost their zookeeper connection.

So it would be correct for the former leader who now has a suspended connection to cease it's leader activities – but other clients who are still connected to the ensemble may have become the leader due to the suspension of client 1's connection.

If client 1 still acted as if it still might be the leader when it's connection becomes suspended, then you would have two leaders – client 1 and whatever client which that still has a healthy ZK connection which grabbed the latch.

>From the perspective of the zookeeper ensemble, it can't know if your client is suffering from a "short connection break" or if it has died altogether – so the client's leader role should be treated as lost in either case.

From: Robert Kamphuis <Ro...@supercell.com>>
Reply-To: "user@curator.apache.org<ma...@curator.apache.org>" <us...@curator.apache.org>>
Date: Wednesday, March 19, 2014 at 6:18 AM
To: "user@curator.apache.org<ma...@curator.apache.org>" <us...@curator.apache.org>>
Cc: Robert Kamphuis <Ro...@supercell.com>>
Subject: Confused about the LeaderLatch - what should happen on ConnectionState.SUSPENDED and ConnectionState.LOST ?


Hi,

I have been working on changing our application to work with Zookeeper and Curator for some while now, and are occasionally getting wrong behaviour out of my system.
The symptom I’m getting is that two servers are concluding that they are the leader of a particular task/leaderlatch at the same time, braking everything in my application.
This does not happen too often - but often enough and it is bad enough for my application. I can get it pretty consistently occurring by restarting one of the servers in our 5-server zookeeper ensembles in turns,
while having multiple servers queuing up for the same leader latch.

My key question is the following:
- WHAT should a user of a leaderLatch do when the connectionState goes to suspended?

My assumption and desired behaviour is that the user should suspend operations - which implies to me that its leadership status is uncertain. (I am holding off all persistent operations for example).
But -I think- this also implies that no-one else can become leader yet - we either have the old-leader still be leader, and no one else, or then the old-leader disappeared and we are in effect leaderless for some time.
This will then be followed by
a) a reconnect - in which case the old leader can continue its stuff (and optionally double check its leadership status) or
b) a lost - in which case the old leader lost its leadership and should release all its power etc and try again or do something else. Someone else likely became leader in my application by then.
The a) or b) is controlled by the SessionTimeout negotiated between the curator/zookeeper client and zookeeper ensemble.

Is my thinking correct here?
and if so, why is the curator’s LeaderLatch.handleStateChange(ConnectionState newState) handling both in the same way : setLeadership(false)

In my application, a leadership change is a pretty big event, due to the amount of work the code does, and I really want leadership to remain between short connection-breaks - eg. one of the zookeeper servers crashes. Leadership should only be swapped on a sessiontimeout - eg. broken application node, or long network break between the server and the zookeeper servers. I am thinking to use 90 second as session timeout (so to survive eg. longer GC breaks and similar without leadership change) - maybe even longer.

Is this a bug in leader latch, or should I use something else than leader latch, or implement my desired behaviour in a new recipe?

kind regards,
Robert Kamphuis

PS. using zookeeper3.4.5 and curator2.4.0



Re: Confused about the LeaderLatch - what should happen on ConnectionState.SUSPENDED and ConnectionState.LOST ?

Posted by Matt Brown <Ma...@citrix.com>.
> My assumption and desired behaviour is that the user should suspend operations - which implies to me that its leadership status is uncertain. (I am holding off all persistent operations for example).
> But -I think- this also implies that no-one else can become leader yet - we either have the old-leader still be leader, and no one else, or then the old-leader disappeared and we are in effect leaderless for some time.

I think the second part of this is incorrect – if client 1 has lost it's zookeeper connection, it doesn't imply that other clients have also lost their zookeeper connection.

So it would be correct for the former leader who now has a suspended connection to cease it's leader activities – but other clients who are still connected to the ensemble may have become the leader due to the suspension of client 1's connection.

If client 1 still acted as if it still might be the leader when it's connection becomes suspended, then you would have two leaders – client 1 and whatever client which that still has a healthy ZK connection which grabbed the latch.

>From the perspective of the zookeeper ensemble, it can't know if your client is suffering from a "short connection break" or if it has died altogether – so the client's leader role should be treated as lost in either case.

From: Robert Kamphuis <Ro...@supercell.com>>
Reply-To: "user@curator.apache.org<ma...@curator.apache.org>" <us...@curator.apache.org>>
Date: Wednesday, March 19, 2014 at 6:18 AM
To: "user@curator.apache.org<ma...@curator.apache.org>" <us...@curator.apache.org>>
Cc: Robert Kamphuis <Ro...@supercell.com>>
Subject: Confused about the LeaderLatch - what should happen on ConnectionState.SUSPENDED and ConnectionState.LOST ?


Hi,

I have been working on changing our application to work with Zookeeper and Curator for some while now, and are occasionally getting wrong behaviour out of my system.
The symptom I’m getting is that two servers are concluding that they are the leader of a particular task/leaderlatch at the same time, braking everything in my application.
This does not happen too often - but often enough and it is bad enough for my application. I can get it pretty consistently occurring by restarting one of the servers in our 5-server zookeeper ensembles in turns,
while having multiple servers queuing up for the same leader latch.

My key question is the following:
- WHAT should a user of a leaderLatch do when the connectionState goes to suspended?

My assumption and desired behaviour is that the user should suspend operations - which implies to me that its leadership status is uncertain. (I am holding off all persistent operations for example).
But -I think- this also implies that no-one else can become leader yet - we either have the old-leader still be leader, and no one else, or then the old-leader disappeared and we are in effect leaderless for some time.
This will then be followed by
a) a reconnect - in which case the old leader can continue its stuff (and optionally double check its leadership status) or
b) a lost - in which case the old leader lost its leadership and should release all its power etc and try again or do something else. Someone else likely became leader in my application by then.
The a) or b) is controlled by the SessionTimeout negotiated between the curator/zookeeper client and zookeeper ensemble.

Is my thinking correct here?
and if so, why is the curator’s LeaderLatch.handleStateChange(ConnectionState newState) handling both in the same way : setLeadership(false)

In my application, a leadership change is a pretty big event, due to the amount of work the code does, and I really want leadership to remain between short connection-breaks - eg. one of the zookeeper servers crashes. Leadership should only be swapped on a sessiontimeout - eg. broken application node, or long network break between the server and the zookeeper servers. I am thinking to use 90 second as session timeout (so to survive eg. longer GC breaks and similar without leadership change) - maybe even longer.

Is this a bug in leader latch, or should I use something else than leader latch, or implement my desired behaviour in a new recipe?

kind regards,
Robert Kamphuis

PS. using zookeeper3.4.5 and curator2.4.0