You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@curator.apache.org by Purshotam Shah <pu...@yahoo-inc.com> on 2016/03/24 18:30:43 UTC

ConnectionState.LOST without retry.

We use apache curator to connect to ZK.We create curator client with following settings.1. session timeout = 5 min2. connection time = 3 min3. Retry = ExponentialBackoffRetry(1000, 10)
We have also setup ConnectionStateListener. We use curator mostly for distributed locking. We shutdown the system when there is a connection lost.
We noticed that if there is long GC pause, we get notified as ConnectionState.LOST and this is causing our system to go down.
We are working on to figure out why there is log GC pause. My question even if we have long GC pause > session timeout, doesn't curator use Retrypolicy to retry before notifying as ConnectionState.LOST
Thanks,

Re: ConnectionState.LOST without retry.

Posted by Jordan Zimmerman <jo...@jordanzimmerman.com>.
I’d consider that a bug then. Please open an issue in Jira.

-Jordan

> On Mar 25, 2016, at 11:50 AM, Purshotam Shah <pu...@yahoo-inc.com> wrote:
> 
> But in a cause of long GC pause, it doesn't.
> This is what we have figure out from our test. If ZK is down, it does retry based on retry policy. But in case of long GC pause, it doesn't. If GC pause > session timeout, then curator notifies connection lost without retrying.
> 
> I was thinking that it will be better if we can retry even for GC pause also.
> 
> Thanks,
> 
> 
> 
> 
> On Friday, March 25, 2016 9:44 AM, Jordan Zimmerman <jo...@jordanzimmerman.com> wrote:
> 
> 
> Curator does retry when the connection is lost, based on the retry policy. ConnectionState.LOST implies that the retry policy gave up.
> 
> -Jordan
> 
>> On Mar 25, 2016, at 11:33 AM, Purshotam Shah <purushah@yahoo-inc.com <ma...@yahoo-inc.com>> wrote:
>> 
>> Thanks for the information. Doesn't it make sense to retry once curator receives connection lost from ZK client? We have seen it doing if ZK is down, curator tries with retry policy before notifying as connection lost.
>> 
>> Thanks,
>> 
>> 
>> 
>> On Thursday, March 24, 2016 1:52 PM, Jordan Zimmerman <jordan@jordanzimmerman.com <ma...@jordanzimmerman.com>> wrote:
>> 
>> 
>> The ZooKeeper client (which Curator uses) sends Heartbeats to the connected server. The heartbeat is sent every 2/3 of a session. If the hearbeat fails, the connection drops. Please read Tech Note 10 for detais: https://cwiki.apache.org/confluence/display/CURATOR/TN10 <https://cwiki.apache.org/confluence/display/CURATOR/TN10>
>> 
>> -Jordan
>> 
>>> On Mar 24, 2016, at 12:30 PM, Purshotam Shah <purushah@yahoo-inc.com <ma...@yahoo-inc.com>> wrote:
>>> 
>>> 
>>> We use apache curator to connect to ZK.
>>> We create curator client with following settings.
>>> 1. session timeout = 5 min
>>> 2. connection time = 3 min
>>> 3. Retry = ExponentialBackoffRetry(1000, 10)
>>> 
>>> We have also setup ConnectionStateListener. We use curator mostly for distributed locking. We shutdown the system when there is a connection lost.
>>> 
>>> We noticed that if there is long GC pause, we get notified as ConnectionState.LOST and this is causing our system to go down.
>>> 
>>> We are working on to figure out why there is log GC pause. 
>>> My question even if we have long GC pause > session timeout, doesn't curator use Retrypolicy to retry before notifying as ConnectionState.LOST
>>> 
>>> Thanks,
>>> 
>> 
>> 
>> 
> 
> 
> 


Re: ConnectionState.LOST without retry.

Posted by Jordan Zimmerman <jo...@jordanzimmerman.com>.
The ZooKeeper client (which Curator uses) sends Heartbeats to the connected server. The heartbeat is sent every 2/3 of a session. If the hearbeat fails, the connection drops. Please read Tech Note 10 for detais: https://cwiki.apache.org/confluence/display/CURATOR/TN10

-Jordan

> On Mar 24, 2016, at 12:30 PM, Purshotam Shah <pu...@yahoo-inc.com> wrote:
> 
> 
> We use apache curator to connect to ZK.
> We create curator client with following settings.
> 1. session timeout = 5 min
> 2. connection time = 3 min
> 3. Retry = ExponentialBackoffRetry(1000, 10)
> 
> We have also setup ConnectionStateListener. We use curator mostly for distributed locking. We shutdown the system when there is a connection lost.
> 
> We noticed that if there is long GC pause, we get notified as ConnectionState.LOST and this is causing our system to go down.
> 
> We are working on to figure out why there is log GC pause. 
> My question even if we have long GC pause > session timeout, doesn't curator use Retrypolicy to retry before notifying as ConnectionState.LOST
> 
> Thanks,
>