You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@curator.apache.org by Benjamin Jaton <bj...@radiantlogic.com> on 2015/01/15 01:28:12 UTC

Curator connection states

Hello,

I am running some simple tests around the connection state listener
behavior.
I use a regular 3 nodes ensemble, 1 of them being down, I start/stop a
second one to trigger an outage of the ensemble.

I use:
- connection timeout : 18 seconds
- session timeout : 72 seconds
- retry interval : 5 seconds

Case 0: there is no retry:
- the switch SUSPENDED -> LOST takes less than a second
- the background retry goes on for 18 seconds

Case 1: there is 1 retry:
- the switch SUSPENDED -> LOST takes 7 seconds
- the background retry goes on for 41 seconds

Case 2: there is 2 retries:
- the switch SUSPENDED -> LOST takes 12 seconds
- the background retry goes on for 64 seconds

I expected to see the same numbers, i.e. I thought that we received a LOST
event when Curator gave up trying.

But apparently the duration of the background retries is this:
*connectionTimeout * nbRetries + retryInterval * max(0, nbRetries-1)*

Why is it linked to the connectionTimeout since the connection fails before
that (case 0, 1 and 2 all go into LOST state in less than 18 seconds)

According to http://curator.apache.org/errors.html , LOST means that "the
connection is confirmed to be lost."
So a LOST state is when I lose my ephemeral nodes (for example).
Is that correct?

Then I am wondering why it would be different whether we have 0, 1 or 2
retries?

Thanks for your insights,
Benjamin

Re: Curator connection states

Posted by Benjamin Jaton <bj...@radiantlogic.com>.

Attaching the logs + code of an example.

I see this:

34347 [CuratorFramework-0] INFO
org.apache.curator.framework.state.ConnectionStateManager  - State change:
LOST
34348 [CuratorFramework-0] ERROR
org.apache.curator.framework.imps.CuratorFrameworkImpl  - Background
operation retry gave up

But it continues to retry and still hangs on my curator call until:

81139 [main] DEBUG org.apache.curator.RetryLoop  - Retry policy not
allowing retry
org.apache.zookeeper.KeeperException$ConnectionLossException:
KeeperErrorCode = ConnectionLoss for /test

On Fri, Jan 16, 2015 at 11:05 AM, Benjamin Jaton <bj...@radiantlogic.com>
wrote:

> Yes, I will open a JIRA for this.
>
> Regarding my original email, it is normal that the background retry goes
> on for longer than the LOST event?
>
> On Thu, Jan 15, 2015 at 2:52 PM, Jordan Zimmerman <
> jordan@jordanzimmerman.com> wrote:
>
>> Yeah. If Curator itself can be altered to support it, please send a PR.
>> Even if not, send a PR as this might be useful to others.
>>
>> -JZ
>>
>>
>>
>> On January 15, 2015 at 5:51:01 PM, Benjamin Jaton (
>> bjaton@radiantlogic.com) wrote:
>>
>> Instead of that, I think I am going to implement a custom event for the
>> session loss that will be trigger either:
>> - sessionTimeoutMs after the last SUSPENDED event (if no RECONNECTED
>> event has been received)
>> - on a RECONNECTED event if the new session ID is different from the old
>> one.
>>
>> It's a little hacky but that should probably do the trick to have a
>> timely notification that the session has been lost.
>>
>> What do you think?
>>
>> On Thu, Jan 15, 2015 at 10:31 AM, Benjamin Jaton <bjaton@radiantlogic.com
>> > wrote:
>>
>>> Ah thanks for the tip, I'm definitely going to try that.
>>>
>>> On Thursday, January 15, 2015, Jordan Zimmerman <
>>> jordan@jordanzimmerman.com> wrote:
>>>
>>>>  NOTE: You can also set a main watcher that watches for session
>>>> expiration.
>>>>
>>>>  -JZ
>>>>
>>>>
>>>>
>>>> On January 15, 2015 at 11:39:53 AM, Jordan Zimmerman (
>>>> jordan@jordanzimmerman.com) wrote:
>>>>
>>>>   LOST was never intended to match session loss. Session loss is only
>>>> detected by ZooKeeper once the connection is re-established.
>>>>
>>>>  -JZ
>>>>
>>>>
>>
>

Re: Curator connection states

Posted by Benjamin Jaton <bj...@radiantlogic.com>.

Yes, I will open a JIRA for this.

Regarding my original email, it is normal that the background retry goes on
for longer than the LOST event?

On Thu, Jan 15, 2015 at 2:52 PM, Jordan Zimmerman <
jordan@jordanzimmerman.com> wrote:

> Yeah. If Curator itself can be altered to support it, please send a PR.
> Even if not, send a PR as this might be useful to others.
>
> -JZ
>
>
>
> On January 15, 2015 at 5:51:01 PM, Benjamin Jaton (bjaton@radiantlogic.com)
> wrote:
>
> Instead of that, I think I am going to implement a custom event for the
> session loss that will be trigger either:
> - sessionTimeoutMs after the last SUSPENDED event (if no RECONNECTED event
> has been received)
> - on a RECONNECTED event if the new session ID is different from the old
> one.
>
> It's a little hacky but that should probably do the trick to have a timely
> notification that the session has been lost.
>
> What do you think?
>
> On Thu, Jan 15, 2015 at 10:31 AM, Benjamin Jaton <bj...@radiantlogic.com>
> wrote:
>
>> Ah thanks for the tip, I'm definitely going to try that.
>>
>> On Thursday, January 15, 2015, Jordan Zimmerman <
>> jordan@jordanzimmerman.com> wrote:
>>
>>>  NOTE: You can also set a main watcher that watches for session
>>> expiration.
>>>
>>>  -JZ
>>>
>>>
>>>
>>> On January 15, 2015 at 11:39:53 AM, Jordan Zimmerman (
>>> jordan@jordanzimmerman.com) wrote:
>>>
>>>   LOST was never intended to match session loss. Session loss is only
>>> detected by ZooKeeper once the connection is re-established.
>>>
>>>  -JZ
>>>
>>>
>

Re: Curator connection states

Posted by Jordan Zimmerman <jo...@jordanzimmerman.com>.

Yeah. If Curator itself can be altered to support it, please send a PR. Even if not, send a PR as this might be useful to others.

-JZ



On January 15, 2015 at 5:51:01 PM, Benjamin Jaton (bjaton@radiantlogic.com) wrote:

Instead of that, I think I am going to implement a custom event for the session loss that will be trigger either:
- sessionTimeoutMs after the last SUSPENDED event (if no RECONNECTED event has been received)
- on a RECONNECTED event if the new session ID is different from the old one.

It's a little hacky but that should probably do the trick to have a timely notification that the session has been lost.

What do you think?

On Thu, Jan 15, 2015 at 10:31 AM, Benjamin Jaton <bj...@radiantlogic.com> wrote:
Ah thanks for the tip, I'm definitely going to try that.

On Thursday, January 15, 2015, Jordan Zimmerman <jo...@jordanzimmerman.com> wrote:
NOTE: You can also set a main watcher that watches for session expiration.

-JZ



On January 15, 2015 at 11:39:53 AM, Jordan Zimmerman (jordan@jordanzimmerman.com) wrote:

LOST was never intended to match session loss. Session loss is only detected by ZooKeeper once the connection is re-established. 

-JZ

Re: Curator connection states

Posted by Benjamin Jaton <bj...@radiantlogic.com>.

Instead of that, I think I am going to implement a custom event for the
session loss that will be trigger either:
- sessionTimeoutMs after the last SUSPENDED event (if no RECONNECTED event
has been received)
- on a RECONNECTED event if the new session ID is different from the old
one.

It's a little hacky but that should probably do the trick to have a timely
notification that the session has been lost.

What do you think?

On Thu, Jan 15, 2015 at 10:31 AM, Benjamin Jaton <bj...@radiantlogic.com>
wrote:

> Ah thanks for the tip, I'm definitely going to try that.
>
> On Thursday, January 15, 2015, Jordan Zimmerman <
> jordan@jordanzimmerman.com> wrote:
>
>> NOTE: You can also set a main watcher that watches for session expiration.
>>
>> -JZ
>>
>>
>>
>> On January 15, 2015 at 11:39:53 AM, Jordan Zimmerman (
>> jordan@jordanzimmerman.com) wrote:
>>
>>  LOST was never intended to match session loss. Session loss is only
>> detected by ZooKeeper once the connection is re-established.
>>
>>  -JZ
>>
>>

Re: Curator connection states

Posted by Benjamin Jaton <bj...@radiantlogic.com>.

Ah thanks for the tip, I'm definitely going to try that.

On Thursday, January 15, 2015, Jordan Zimmerman <jo...@jordanzimmerman.com>
wrote:

> NOTE: You can also set a main watcher that watches for session expiration.
>
> -JZ
>
>
>
> On January 15, 2015 at 11:39:53 AM, Jordan Zimmerman (
> jordan@jordanzimmerman.com
> <javascript:_e(%7B%7D,'cvml','jordan@jordanzimmerman.com');>) wrote:
>
>  LOST was never intended to match session loss. Session loss is only
> detected by ZooKeeper once the connection is re-established.
>
>  -JZ
>
>

Re: Curator connection states

Posted by Jordan Zimmerman <jo...@jordanzimmerman.com>.

NOTE: You can also set a main watcher that watches for session expiration.

-JZ



On January 15, 2015 at 11:39:53 AM, Jordan Zimmerman (jordan@jordanzimmerman.com) wrote:

LOST was never intended to match session loss. Session loss is only detected by ZooKeeper once the connection is re-established. 

-JZ

Re: Curator connection states

Posted by Jordan Zimmerman <jo...@jordanzimmerman.com>.

LOST was never intended to match session loss. Session loss is only detected by ZooKeeper once the connection is re-established. 

-JZ

Re: Curator connection states

Posted by Benjamin Jaton <bj...@radiantlogic.com>.

Some of the comment in https://issues.apache.org/jira/browse/CURATOR-134
are interesting.

Apparently having a LOST event doesn't mean that the session has timed out.

The doc says (http://curator.apache.org/errors.html) :
"The connection is confirmed to be lost. Close any locks, leaders, etc. and
attempt to re-create them. NOTE: it is possible to get a RECONNECTED state
after this but you should still consider any locks, etc. as dirty/unstable."

But then in some cases we are going to recover our previous session after
we received the LOST event.
If that's the case, then the LOST event isn't as useful as I thought it was.

What I would like would be an event on the session loss. Is there any way
to do this?

Also is there a way to be notified of when Curator stops retrying for good?

Thanks,
Ben

On Wed, Jan 14, 2015 at 4:28 PM, Benjamin Jaton <bj...@radiantlogic.com>
wrote:

> Hello,
>
> I am running some simple tests around the connection state listener
> behavior.
> I use a regular 3 nodes ensemble, 1 of them being down, I start/stop a
> second one to trigger an outage of the ensemble.
>
> I use:
> - connection timeout : 18 seconds
> - session timeout : 72 seconds
> - retry interval : 5 seconds
>
> Case 0: there is no retry:
> - the switch SUSPENDED -> LOST takes less than a second
> - the background retry goes on for 18 seconds
>
> Case 1: there is 1 retry:
> - the switch SUSPENDED -> LOST takes 7 seconds
> - the background retry goes on for 41 seconds
>
> Case 2: there is 2 retries:
> - the switch SUSPENDED -> LOST takes 12 seconds
> - the background retry goes on for 64 seconds
>
> I expected to see the same numbers, i.e. I thought that we received a LOST
> event when Curator gave up trying.
>
> But apparently the duration of the background retries is this:
> *connectionTimeout * nbRetries + retryInterval * max(0, nbRetries-1)*
>
> Why is it linked to the connectionTimeout since the connection fails
> before that (case 0, 1 and 2 all go into LOST state in less than 18 seconds)
>
> According to http://curator.apache.org/errors.html , LOST means that "the
> connection is confirmed to be lost."
> So a LOST state is when I lose my ephemeral nodes (for example).
> Is that correct?
>
> Then I am wondering why it would be different whether we have 0, 1 or 2
> retries?
>
> Thanks for your insights,
> Benjamin
>
>
>