You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@curator.apache.org by Jordan Zimmerman <jo...@jordanzimmerman.com> on 2015/09/14 00:38:13 UTC

DEVS PLEASE READ: What is the correct timeout for a session

Devs,

Given that Curator 3.0 will try to accurately track the session, I realize I’m a bit confused about when a session actually expires. In the implementation I pushed, a timer starts when Watcher.Event.KeeperState.Disconnected is seen. Then, if the negotiated session timeout elapses, Curator simulates a session expiration. However, the timeout is based on the saved time when Disconnected is seen. I’ve been searching in the ZK code and it’s hard to tell if that’s correct. I’d appreciate a few other eyes on this. The significant class in ZK is SessionTrackerImpl.java. touchSession() is called periodically and a kind of priority queue is used to pull out expiring sessions.

Thanks!

-Jordan


Re: DEVS PLEASE READ: What is the correct timeout for a session

Posted by Jordan Zimmerman <jo...@jordanzimmerman.com>.
https://github.com/apache/curator/pull/105



On September 14, 2015 at 5:09:28 PM, Cameron McKenzie (mckenzie.cam@gmail.com) wrote:

Yeah, that's certainly an option. As long as it's documented why it's there, I think it's not a bad idea.

On Mon, Sep 14, 2015 at 11:53 PM, Jordan Zimmerman <jo...@jordanzimmerman.com> wrote:
Maybe the ConnectionHandlingPolicy could have an optional fudge factor for checking session timeouts? 

-Jordan



On September 14, 2015 at 12:59:12 AM, Cameron McKenzie (mckenzie.cam@gmail.com) wrote:

I'd probably lean towards leaving it as is, unless we're going to put some more trickery in there to handle the case where we've reported a session loss event and subsequently found out the session was not lost on reconnection. Not sure how this would be done though.

On Mon, Sep 14, 2015 at 3:24 PM, Jordan Zimmerman <jo...@jordanzimmerman.com> wrote:
Good point, either way it probably needs to be documented, as it would 
probably be confusing to get a session loss event from Curator and then 
manage to reconnect to ZK and still find all your sessions ephemeral nodes 
present. 
True. What should we do? Leave it as is?

I presume it's not possible to get a hook into the acknowledgement of an 
event from ZK? We could use that as the start of the session timeout timer. 
Even if we could, the important stuff happens on the server so it’s moot. 





On September 13, 2015 at 11:42:16 PM, Cameron McKenzie (mckenzie.cam@gmail.com) wrote:

Good point, either way it probably needs to be documented, as it would
probably be confusing to get a session loss event from Curator and then
manage to reconnect to ZK and still find all your sessions ephemeral nodes
present.

I presume it's not possible to get a hook into the acknowledgement of an
event from ZK? We could use that as the start of the session timeout timer.

On Mon, Sep 14, 2015 at 2:39 PM, Jordan Zimmerman <
jordan@jordanzimmerman.com> wrote:

> Not sure if this is an issue or not. It's better that Curator declares a
> session lost a bit later than a bit earlier than ZK.
> Actually, I was thinking it would be better if Curator declares lost
> before ZK does. The idea is to wait until the last moment to stop locks,
> etc. But, users would still want to not have two processes thinking they
> own the same lock. I wonder if we need to add a “fudge factor” of some kind
> so that Curator fakes the session loss a bit before the negotiated session
> timeout elapses.
>
>
>
> -JZ
>
>



Re: DEVS PLEASE READ: What is the correct timeout for a session

Posted by Cameron McKenzie <mc...@gmail.com>.
Yeah, that's certainly an option. As long as it's documented why it's
there, I think it's not a bad idea.

On Mon, Sep 14, 2015 at 11:53 PM, Jordan Zimmerman <
jordan@jordanzimmerman.com> wrote:

> Maybe the ConnectionHandlingPolicy could have an optional fudge factor for
> checking session timeouts?
>
> -Jordan
>
>
>
> On September 14, 2015 at 12:59:12 AM, Cameron McKenzie (
> mckenzie.cam@gmail.com) wrote:
>
> I'd probably lean towards leaving it as is, unless we're going to put some
> more trickery in there to handle the case where we've reported a session
> loss event and subsequently found out the session was not lost on
> reconnection. Not sure how this would be done though.
>
> On Mon, Sep 14, 2015 at 3:24 PM, Jordan Zimmerman <
> jordan@jordanzimmerman.com> wrote:
>
>> Good point, either way it probably needs to be documented, as it would
>> probably be confusing to get a session loss event from Curator and then
>> manage to reconnect to ZK and still find all your sessions ephemeral
>> nodes
>>
>> present.
>>
>> True. What should we do? Leave it as is?
>>
>> I presume it's not possible to get a hook into the acknowledgement of an
>> event from ZK? We could use that as the start of the session timeout
>> timer.
>>
>> Even if we could, the important stuff happens on the server so it’s moot.
>>
>>
>>
>>
>> On September 13, 2015 at 11:42:16 PM, Cameron McKenzie (
>> mckenzie.cam@gmail.com) wrote:
>>
>> Good point, either way it probably needs to be documented, as it would
>> probably be confusing to get a session loss event from Curator and then
>> manage to reconnect to ZK and still find all your sessions ephemeral nodes
>> present.
>>
>> I presume it's not possible to get a hook into the acknowledgement of an
>> event from ZK? We could use that as the start of the session timeout
>> timer.
>>
>> On Mon, Sep 14, 2015 at 2:39 PM, Jordan Zimmerman <
>> jordan@jordanzimmerman.com> wrote:
>>
>> > Not sure if this is an issue or not. It's better that Curator declares a
>> > session lost a bit later than a bit earlier than ZK.
>> > Actually, I was thinking it would be better if Curator declares lost
>> > before ZK does. The idea is to wait until the last moment to stop locks,
>> > etc. But, users would still want to not have two processes thinking they
>> > own the same lock. I wonder if we need to add a “fudge factor” of some
>> kind
>> > so that Curator fakes the session loss a bit before the negotiated
>> session
>> > timeout elapses.
>> >
>> >
>> >
>> > -JZ
>> >
>> >
>>
>>
>

Re: DEVS PLEASE READ: What is the correct timeout for a session

Posted by Jordan Zimmerman <jo...@jordanzimmerman.com>.
Maybe the ConnectionHandlingPolicy could have an optional fudge factor for checking session timeouts? 

-Jordan


On September 14, 2015 at 12:59:12 AM, Cameron McKenzie (mckenzie.cam@gmail.com) wrote:

I'd probably lean towards leaving it as is, unless we're going to put some more trickery in there to handle the case where we've reported a session loss event and subsequently found out the session was not lost on reconnection. Not sure how this would be done though.

On Mon, Sep 14, 2015 at 3:24 PM, Jordan Zimmerman <jo...@jordanzimmerman.com> wrote:
Good point, either way it probably needs to be documented, as it would 
probably be confusing to get a session loss event from Curator and then 
manage to reconnect to ZK and still find all your sessions ephemeral nodes 
present. 
True. What should we do? Leave it as is?

I presume it's not possible to get a hook into the acknowledgement of an 
event from ZK? We could use that as the start of the session timeout timer. 
Even if we could, the important stuff happens on the server so it’s moot. 





On September 13, 2015 at 11:42:16 PM, Cameron McKenzie (mckenzie.cam@gmail.com) wrote:

Good point, either way it probably needs to be documented, as it would
probably be confusing to get a session loss event from Curator and then
manage to reconnect to ZK and still find all your sessions ephemeral nodes
present.

I presume it's not possible to get a hook into the acknowledgement of an
event from ZK? We could use that as the start of the session timeout timer.

On Mon, Sep 14, 2015 at 2:39 PM, Jordan Zimmerman <
jordan@jordanzimmerman.com> wrote:

> Not sure if this is an issue or not. It's better that Curator declares a
> session lost a bit later than a bit earlier than ZK.
> Actually, I was thinking it would be better if Curator declares lost
> before ZK does. The idea is to wait until the last moment to stop locks,
> etc. But, users would still want to not have two processes thinking they
> own the same lock. I wonder if we need to add a “fudge factor” of some kind
> so that Curator fakes the session loss a bit before the negotiated session
> timeout elapses.
>
>
>
> -JZ
>
>


Re: DEVS PLEASE READ: What is the correct timeout for a session

Posted by Cameron McKenzie <mc...@gmail.com>.
I'd probably lean towards leaving it as is, unless we're going to put some
more trickery in there to handle the case where we've reported a session
loss event and subsequently found out the session was not lost on
reconnection. Not sure how this would be done though.

On Mon, Sep 14, 2015 at 3:24 PM, Jordan Zimmerman <
jordan@jordanzimmerman.com> wrote:

> Good point, either way it probably needs to be documented, as it would
> probably be confusing to get a session loss event from Curator and then
> manage to reconnect to ZK and still find all your sessions ephemeral nodes
>
> present.
>
> True. What should we do? Leave it as is?
>
> I presume it's not possible to get a hook into the acknowledgement of an
> event from ZK? We could use that as the start of the session timeout
> timer.
>
> Even if we could, the important stuff happens on the server so it’s moot.
>
>
>
>
> On September 13, 2015 at 11:42:16 PM, Cameron McKenzie (
> mckenzie.cam@gmail.com) wrote:
>
> Good point, either way it probably needs to be documented, as it would
> probably be confusing to get a session loss event from Curator and then
> manage to reconnect to ZK and still find all your sessions ephemeral nodes
> present.
>
> I presume it's not possible to get a hook into the acknowledgement of an
> event from ZK? We could use that as the start of the session timeout
> timer.
>
> On Mon, Sep 14, 2015 at 2:39 PM, Jordan Zimmerman <
> jordan@jordanzimmerman.com> wrote:
>
> > Not sure if this is an issue or not. It's better that Curator declares a
> > session lost a bit later than a bit earlier than ZK.
> > Actually, I was thinking it would be better if Curator declares lost
> > before ZK does. The idea is to wait until the last moment to stop locks,
> > etc. But, users would still want to not have two processes thinking they
> > own the same lock. I wonder if we need to add a “fudge factor” of some
> kind
> > so that Curator fakes the session loss a bit before the negotiated
> session
> > timeout elapses.
> >
> >
> >
> > -JZ
> >
> >
>
>

Re: DEVS PLEASE READ: What is the correct timeout for a session

Posted by Jordan Zimmerman <jo...@jordanzimmerman.com>.
Good point, either way it probably needs to be documented, as it would 
probably be confusing to get a session loss event from Curator and then 
manage to reconnect to ZK and still find all your sessions ephemeral nodes 
present. 
True. What should we do? Leave it as is?

I presume it's not possible to get a hook into the acknowledgement of an 
event from ZK? We could use that as the start of the session timeout timer. 
Even if we could, the important stuff happens on the server so it’s moot. 





On September 13, 2015 at 11:42:16 PM, Cameron McKenzie (mckenzie.cam@gmail.com) wrote:

Good point, either way it probably needs to be documented, as it would  
probably be confusing to get a session loss event from Curator and then  
manage to reconnect to ZK and still find all your sessions ephemeral nodes  
present.  

I presume it's not possible to get a hook into the acknowledgement of an  
event from ZK? We could use that as the start of the session timeout timer.  

On Mon, Sep 14, 2015 at 2:39 PM, Jordan Zimmerman <  
jordan@jordanzimmerman.com> wrote:  

> Not sure if this is an issue or not. It's better that Curator declares a  
> session lost a bit later than a bit earlier than ZK.  
> Actually, I was thinking it would be better if Curator declares lost  
> before ZK does. The idea is to wait until the last moment to stop locks,  
> etc. But, users would still want to not have two processes thinking they  
> own the same lock. I wonder if we need to add a “fudge factor” of some kind  
> so that Curator fakes the session loss a bit before the negotiated session  
> timeout elapses.  
>  
>  
>  
> -JZ  
>  
>  

Re: DEVS PLEASE READ: What is the correct timeout for a session

Posted by Cameron McKenzie <mc...@gmail.com>.
Good point, either way it probably needs to be documented, as it would
probably be confusing to get a session loss event from Curator and then
manage to reconnect to ZK and still find all your sessions ephemeral nodes
present.

I presume it's not possible to get a hook into the acknowledgement of an
event from ZK? We could use that as the start of the session timeout timer.

On Mon, Sep 14, 2015 at 2:39 PM, Jordan Zimmerman <
jordan@jordanzimmerman.com> wrote:

> Not sure if this is an issue or not. It's better that Curator declares a
> session lost a bit later than a bit earlier than ZK.
> Actually, I was thinking it would be better if Curator declares lost
> before ZK does. The idea is to wait until the last moment to stop locks,
> etc. But, users would still want to not have two processes thinking they
> own the same lock. I wonder if we need to add a “fudge factor” of some kind
> so that Curator fakes the session loss a bit before the negotiated session
> timeout elapses.
>
>
>
> -JZ
>
>

Re: DEVS PLEASE READ: What is the correct timeout for a session

Posted by Jordan Zimmerman <jo...@jordanzimmerman.com>.
Not sure if this is an issue or not. It's better that Curator declares a 
session lost a bit later than a bit earlier than ZK. 
Actually, I was thinking it would be better if Curator declares lost before ZK does. The idea is to wait until the last moment to stop locks, etc. But, users would still want to not have two processes thinking they own the same lock. I wonder if we need to add a “fudge factor” of some kind so that Curator fakes the session loss a bit before the negotiated session timeout elapses.



-JZ


Re: DEVS PLEASE READ: What is the correct timeout for a session

Posted by Cameron McKenzie <mc...@gmail.com>.
My reading of it is similar to yours I think.

There's a session expiry queue which puts sessions into buckets based on
their expiry time. Whenever an event is received for a given session, the
expiry time gets recalculated, and the session moves to a new bucket in the
queue.

Then, the SessionTrackerImpl thread just polls this queue and times out any
session that hasn't had any activity for longer than the negotiated session
time.

So, I guess that Curator could be slightly slower at determining a session
timeout than ZK would be, because ZK is going to begin its timeout check at
the time of the last event received for the session, whereas Curator can
only begin the session timeout check when it sees a disconnected event.
Which is potentially going to be up to a heartbeat interval slower than ZK?

Not sure if this is an issue or not. It's better that Curator declares a
session lost a bit later than a bit earlier than ZK.

On Mon, Sep 14, 2015 at 8:38 AM, Jordan Zimmerman <
jordan@jordanzimmerman.com> wrote:

> Devs,
>
> Given that Curator 3.0 will try to accurately track the session, I realize
> I’m a bit confused about when a session actually expires. In the
> implementation I pushed, a timer starts
> when Watcher.Event.KeeperState.Disconnected is seen. Then, if the
> negotiated session timeout elapses, Curator simulates a session expiration.
> However, the timeout is based on the saved time when Disconnected is seen.
> I’ve been searching in the ZK code and it’s hard to tell if that’s correct.
> I’d appreciate a few other eyes on this. The significant class in ZK is
> SessionTrackerImpl.java. touchSession() is called periodically and a kind
> of priority queue is used to pull out expiring sessions.
>
> Thanks!
>
> -Jordan
>
>