You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@curator.apache.org by Göktürk Gezer <go...@apache.org> on 2019/05/13 06:45:11 UTC

Curator client getting suspended before zookeeper session timeout

Hello,

While using *LeaderSelector* recipe, I'm having an issue where the curator
session is getting suspended after not receiving a ping response for a
period of time that is less than the configured zookeeper session timeout.

I'm using Curator 4.0.1, and Zookeeper 3.4.9.
Configured zookeeper session timeout is 60 seconds.

In our test cluster, I'm observing a steady ping between client and server
which usually takes 13 seconds. However while under high CPU utilization,
ping doesn't reach server and that causes immediate session suspension,
which results in a change in leader. So the session is getting suspended
~30 seconds after the last successful ping.

My questions are:

   - Is it expected for a curator to suspend the connection before
   zookeeper session timeout?
   - What role a curator framework retry policy play in a standard heart
   beat? (I use ExponentialBackoff with 3 retries)


Regards,
Gokturk

Re: Curator client getting suspended before zookeeper session timeout

Posted by Jordan Zimmerman <jo...@jordanzimmerman.com>.
"Suspended" is sent as soon as the connection is lost. "Lost" is sent when the session timeout expires. This is documented here: http://curator.apache.org/errors.html <http://curator.apache.org/errors.html> (see Notifications). Also, this Tech Note has some details: https://cwiki.apache.org/confluence/display/CURATOR/TN12 <https://cwiki.apache.org/confluence/display/CURATOR/TN12> as well as this one: https://cwiki.apache.org/confluence/display/CURATOR/TN14 <https://cwiki.apache.org/confluence/display/CURATOR/TN14> 

-Jordan

> On May 13, 2019, at 1:45 AM, Göktürk Gezer <go...@apache.org> wrote:
> 
> Hello,
> 
> While using LeaderSelector recipe, I'm having an issue where the curator session is getting suspended after not receiving a ping response for a period of time that is less than the configured zookeeper session timeout.
> 
> I'm using Curator 4.0.1, and Zookeeper 3.4.9.
> Configured zookeeper session timeout is 60 seconds.
> 
> In our test cluster, I'm observing a steady ping between client and server which usually takes 13 seconds. However while under high CPU utilization, ping doesn't reach server and that causes immediate session suspension, which results in a change in leader. So the session is getting suspended ~30 seconds after the last successful ping.
> 
> My questions are:
> Is it expected for a curator to suspend the connection before zookeeper session timeout?
> What role a curator framework retry policy play in a standard heart beat? (I use ExponentialBackoff with 3 retries)
> 
> Regards,
> Gokturk