You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@curator.apache.org by "Shaun Senecal (JIRA)" <ji...@apache.org> on 2013/10/10 07:27:41 UTC

[jira] [Commented] (CURATOR-64) Retry logic appears to delay reconnect after session expiry

    [ https://issues.apache.org/jira/browse/CURATOR-64?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13791212#comment-13791212 ] 

Shaun Senecal commented on CURATOR-64:
--------------------------------------

I'm still confused.

The behaviour we are seeing is that Curator is hanging for several minutes, logging exceptions about failed retry attempts all along the way, before being able to reconnect.  Are you saying this is the expected behaviour?

I understand that Curator is managing the connection for me, which is why I assume that the retry logic should be able to run in parallel with the reconnect logic so that our service spends as little time as possible disconnected from the cluster.  Am I still missing something?



> Retry logic appears to delay reconnect after session expiry
> -----------------------------------------------------------
>
>                 Key: CURATOR-64
>                 URL: https://issues.apache.org/jira/browse/CURATOR-64
>             Project: Apache Curator
>          Issue Type: Bug
>          Components: Framework
>            Reporter: Shaun Senecal
>         Attachments: SessionExpiryTest.java
>
>
> If a watch is triggered immediately before a session expiry, and the watch attempts to fetch data from ZK (using Curator), its possible that the reconnect behaviour is delayed until the retry gives up
> It currently looks something like this:
> 1. watch A is triggered, begins processing
> 2. session is expired (watch A hasnt completed execution yet)
> 3. watch A attempts to fetch data from ZK (say: curator.getData()...)
> 4. the getData() will retry until the policy tells it to give up (could be several minutes)
> 5. finally curator will reconnect to ZK
> I would expect something more like this:
> 1. watch A is triggered, begins processing
> 2. session is expired (watch A hasnt completed execution yet)
> 3. watch A attempts to fetch data from ZK (say: curator.getData()...)
> 4. the first getData() fails because of session expiry (should be nearly instantly)
> 5. curator reconnects to ZK
> 6. a second attempt to call getData() is made via the RetryPolicy
> 7. watch A completes processing
> We are using the BoundedExponentialBackoffRetry, so we end up waiting for quite a while after session expiry, leaving our services dead in the water for much longer than is necessary.
> This occurs with curator v1.3.3 and ZK 3.4.5



--
This message was sent by Atlassian JIRA
(v6.1#6144)