You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@zookeeper.apache.org by Sampath Perera <sa...@adroitlogic.com> on 2011/08/18 05:40:28 UTC

Fast leader election initial delay, is that possible?

Hi,

We have a deployment of a 3 node ZooKeeper quorum. When we get to starting
the 3 ZooKeeper nodes the first node getting started prints the following
connection refused exception, which is true as the node 2 and 3 are yet to
be started. This seems to be because of the FastLeaderElection trying to
connect to the other nodes specified in the quorum.

So my question is whether it is possible to configure an initial delay for
the FastLeaderElection to be kicked off?

The rationale being that it is highly unlikely that all 3 nodes started at
the same time, even in the case where we try to command the startups at the
same time, and we could get rid of this stacktrace from the logs, as this
will trigger warning on the tools that are monitoring the logs, yet is not
actually a WARN rather an expected error.

2011-08-18 08:53:15,530 [-] [WorkerSender Thread]  WARN QuorumCnxManager
Cannot open channel to 2 at election address localhost/127.0.0.1:3888
java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
    at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:100)
    at
org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:371)
    at
org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:340)
    at
org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:360)
    at
org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:333)
    at java.lang.Thread.run(Thread.java:662)
2011-08-18 08:53:15,532 [-] [WorkerSender Thread]  WARN QuorumCnxManager
Cannot open channel to 3 at election address localhost/127.0.0.1:3889
java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
    at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:100)
    at
org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:371)
    at
org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:340)
    at
org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:360)
    at
org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:333)
    at java.lang.Thread.run(Thread.java:662)

-- 
Thanks,
Sampath
http://adroitlogic.org

Re: Fast leader election initial delay, is that possible?

Posted by Sampath Perera <sa...@adroitlogic.com>.

Hi Vishal,

On Sat, Aug 20, 2011 at 1:43 AM, Vishal Kher <vi...@gmail.com> wrote:

> My few cents..
> I am not sure if we can distinguish between spurious/non-spurious warnings
> and I don't think we can time it well. The delay is applicable only in
> certain cases. If the user knows that there will be a start up delay, then
> the user can ignore those errors or modify their scripts to start the server
> after a delay.


I guess you misinterpreted it :-( starting the server after delay is not a
solution for the original problem that I was referring to. I do not also see
it possible to get my original problem fixed through a script. At least I do
not know how to do it. May be changing the log level to something like FATAL
and reverting it back to INFO after the delay?? I do not think that is a
good idea as that will cut off some of the stuff that I want to see.


> Does this have to implemented in the server? I sounds me that this is
> something that user scripts should handle.
>

As I said I do not see how the user script can handle this? if there is any
option please do let me know.

Sampath


>
>
> On Fri, Aug 19, 2011 at 7:00 AM, Flavio Junqueira <fp...@yahoo-inc.com>wrote:
>
>> Sampath, Do you think something along the lines of what Ted describes
>> would work for you?
>>
>> -Flavio
>>
>> On Aug 18, 2011, at 7:13 PM, Ted Dunning wrote:
>>
>> The thought is that a server would not complain about connection refused
>> or inability to form a quorum during the first (say) twenty seconds of
>> operation.
>>
>> The thesis is that warnings from these causes during that time are
>> spurious.
>>
>> As I mentioned, I don't see this as urgent or even necessarily a good
>> idea.  I completely reboot a ZK cluster once every year or three.  When I am
>> doing a rolling upgrade, I *want* to see alerts when I bounce a machine.  If
>> I don't want to see those alerts, my monitoring system allows me to put a
>> machine into maintenance mode for a short period of time to temporarily
>> suppress the warnings.
>>
>> All I was doing was translating and elaborating the original poster's
>> suggestion, not so much endorsing it.
>>
>> On Thu, Aug 18, 2011 at 8:54 AM, Flavio Junqueira <fp...@yahoo-inc.com>wrote:
>>
>>> Hi Ted, I don't see how one can automate the distinction between a
>>> machine that is down because it crashed and a machine that is down because
>>> it hasn't started yet. Assuming that we are logging the machine
>>> unavailability as we are doing currently, one can always look at the
>>> timestamp of the warning and remember that this is the time the machines
>>> were bootstrapping. Consequently, I don't really see the point of reducing
>>> the number of warnings, unless the warnings are really polluting the logs. I
>>> typically don't see so many that prevents me from reading the rest, but you
>>> may have a different perception. Also, recall that we back off, so the
>>> warnings become less frequent over time.
>>>
>>> I'm open to ideas, though. If you see anything wrong in my rationale or
>>> if you have an idea of how to do it differently, then I'd be happy to hear.
>>> However, if the idea is simply to add a parameter that configures the time
>>> for leader election to start, then I'm currently not in favor.
>>>
>>> -Flavio
>>>
>>> On Aug 18, 2011, at 5:39 PM, Ted Dunning wrote:
>>>
>>> Flavio,
>>>
>>> What you say is correct, but the original poster does have a point that
>>> many
>>> of these warnings are to be expected and there is a heuristic that might
>>> assist in distinguishing some of these cases so that false alarms in the
>>> logs could be decreased.
>>>
>>> That doesn't seem like a big deal to me, but different people have
>>> different
>>> itches.  In my experience, restarting a ZK cluster from zero almost never
>>> happens.
>>>
>>> On Thu, Aug 18, 2011 at 8:36 AM, Ted Dunning <te...@gmail.com>
>>> wrote:
>>>
>>>
>>>
>>> On Thu, Aug 18, 2011 at 12:15 AM, Sampath Perera <
>>> sampath@adroitlogic.com>wrote:
>>>
>>>
>>>
>>> Hhmmm, I think this is a bit different isn't it? Here we know that the
>>>
>>> first
>>>
>>> server to come will be failing to connect to the other as they are not
>>> yet
>>>
>>> up. Anyway our real issue is the warning.
>>>
>>>
>>>
>>> We know that.
>>>
>>>
>>> But how does the server know that it is the first server?  That is the
>>>
>>> whole point of the leader election.  You might just have a server
>>> rejoining
>>>
>>> a cluster.  Or you might have a cluster that has been turned off.  Or a
>>>
>>> cluster with 2 out of 5 machines off and we tried to touch the other down
>>>
>>> machine before the others.
>>>
>>>
>>>
>>>
>>> Would you like to suggest a patch?
>>>
>>>
>>>
>>> Of course I do.. will prepare a patch and attach.
>>>
>>>
>>>
>>> Great!
>>>
>>>
>>>
>>>
>>>   *flavio*
>>> *junqueira*
>>>
>>> research scientist
>>>
>>> fpj@yahoo-inc.com
>>> direct +34 93-183-8828
>>>
>>> avinguda diagonal 177, 8th floor, barcelona, 08018, es
>>> phone (408) 349 3300    fax (408) 349 3301
>>>
>>>
>>>
>>
>>   *flavio*
>> *junqueira*
>>
>> research scientist
>>
>> fpj@yahoo-inc.com
>> direct +34 93-183-8828
>>
>> avinguda diagonal 177, 8th floor, barcelona, 08018, es
>> phone (408) 349 3300    fax (408) 349 3301
>>
>>
>>
>


-- 
Thanks,
Sampath
http://adroitlogic.org

Re: Fast leader election initial delay, is that possible?

Posted by Vishal Kher <vi...@gmail.com>.

My few cents..
I am not sure if we can distinguish between spurious/non-spurious warnings
and I don't think we can time it well. The delay is applicable only in
certain cases. If the user knows that there will be a start up delay, then
the user can ignore those errors or modify their scripts to start the server
after a delay. Does this have to implemented in the server? I sounds me that
this is something that user scripts should handle.


On Fri, Aug 19, 2011 at 7:00 AM, Flavio Junqueira <fp...@yahoo-inc.com> wrote:

> Sampath, Do you think something along the lines of what Ted describes would
> work for you?
>
> -Flavio
>
> On Aug 18, 2011, at 7:13 PM, Ted Dunning wrote:
>
> The thought is that a server would not complain about connection refused or
> inability to form a quorum during the first (say) twenty seconds of
> operation.
>
> The thesis is that warnings from these causes during that time are
> spurious.
>
> As I mentioned, I don't see this as urgent or even necessarily a good idea.
>  I completely reboot a ZK cluster once every year or three.  When I am doing
> a rolling upgrade, I *want* to see alerts when I bounce a machine.  If I
> don't want to see those alerts, my monitoring system allows me to put a
> machine into maintenance mode for a short period of time to temporarily
> suppress the warnings.
>
> All I was doing was translating and elaborating the original poster's
> suggestion, not so much endorsing it.
>
> On Thu, Aug 18, 2011 at 8:54 AM, Flavio Junqueira <fp...@yahoo-inc.com>wrote:
>
>> Hi Ted, I don't see how one can automate the distinction between a machine
>> that is down because it crashed and a machine that is down because it hasn't
>> started yet. Assuming that we are logging the machine unavailability as we
>> are doing currently, one can always look at the timestamp of the warning and
>> remember that this is the time the machines were bootstrapping.
>> Consequently, I don't really see the point of reducing the number of
>> warnings, unless the warnings are really polluting the logs. I typically
>> don't see so many that prevents me from reading the rest, but you may have a
>> different perception. Also, recall that we back off, so the warnings become
>> less frequent over time.
>>
>> I'm open to ideas, though. If you see anything wrong in my rationale or if
>> you have an idea of how to do it differently, then I'd be happy to hear.
>> However, if the idea is simply to add a parameter that configures the time
>> for leader election to start, then I'm currently not in favor.
>>
>> -Flavio
>>
>> On Aug 18, 2011, at 5:39 PM, Ted Dunning wrote:
>>
>> Flavio,
>>
>> What you say is correct, but the original poster does have a point that
>> many
>> of these warnings are to be expected and there is a heuristic that might
>> assist in distinguishing some of these cases so that false alarms in the
>> logs could be decreased.
>>
>> That doesn't seem like a big deal to me, but different people have
>> different
>> itches.  In my experience, restarting a ZK cluster from zero almost never
>> happens.
>>
>> On Thu, Aug 18, 2011 at 8:36 AM, Ted Dunning <te...@gmail.com>
>> wrote:
>>
>>
>>
>> On Thu, Aug 18, 2011 at 12:15 AM, Sampath Perera <sampath@adroitlogic.com
>> >wrote:
>>
>>
>>
>> Hhmmm, I think this is a bit different isn't it? Here we know that the
>>
>> first
>>
>> server to come will be failing to connect to the other as they are not yet
>>
>> up. Anyway our real issue is the warning.
>>
>>
>>
>> We know that.
>>
>>
>> But how does the server know that it is the first server?  That is the
>>
>> whole point of the leader election.  You might just have a server
>> rejoining
>>
>> a cluster.  Or you might have a cluster that has been turned off.  Or a
>>
>> cluster with 2 out of 5 machines off and we tried to touch the other down
>>
>> machine before the others.
>>
>>
>>
>>
>> Would you like to suggest a patch?
>>
>>
>>
>> Of course I do.. will prepare a patch and attach.
>>
>>
>>
>> Great!
>>
>>
>>
>>
>>   *flavio*
>> *junqueira*
>>
>> research scientist
>>
>> fpj@yahoo-inc.com
>> direct +34 93-183-8828
>>
>> avinguda diagonal 177, 8th floor, barcelona, 08018, es
>> phone (408) 349 3300    fax (408) 349 3301
>>
>>
>>
>
> *flavio*
> *junqueira*
>
> research scientist
>
> fpj@yahoo-inc.com
> direct +34 93-183-8828
>
> avinguda diagonal 177, 8th floor, barcelona, 08018, es
> phone (408) 349 3300    fax (408) 349 3301
>
>
>

Re: Fast leader election initial delay, is that possible?

Posted by Sampath Perera <sa...@adroitlogic.com>.

Yeah, that will work for me.

Also, it just is going to be a configuration and the overhead introduced
will only be applicable to the case where this error occurs, as it is just
an if statement before printing out the error.

The default behavior will not be changed and I do not expect any overhead to
be introduced with this to the default case.

OTOH, I am OK to leave it as it is, and let our customer know that, that is
how it is :-) Actually my original intention was to find whether there is
any such configuration, as I was unable to find it on the docs.

So, if the majority of dev's are not in favour of this change I would not do
this.

Thanks for all your feedback!

Sampath

On Fri, Aug 19, 2011 at 4:30 PM, Flavio Junqueira <fp...@yahoo-inc.com> wrote:

> Sampath, Do you think something along the lines of what Ted describes would
> work for you?
>
> -Flavio
>
> On Aug 18, 2011, at 7:13 PM, Ted Dunning wrote:
>
> The thought is that a server would not complain about connection refused or
> inability to form a quorum during the first (say) twenty seconds of
> operation.
>
> The thesis is that warnings from these causes during that time are
> spurious.
>
> As I mentioned, I don't see this as urgent or even necessarily a good idea.
>  I completely reboot a ZK cluster once every year or three.  When I am doing
> a rolling upgrade, I *want* to see alerts when I bounce a machine.  If I
> don't want to see those alerts, my monitoring system allows me to put a
> machine into maintenance mode for a short period of time to temporarily
> suppress the warnings.
>
> All I was doing was translating and elaborating the original poster's
> suggestion, not so much endorsing it.
>
> On Thu, Aug 18, 2011 at 8:54 AM, Flavio Junqueira <fp...@yahoo-inc.com>wrote:
>
>> Hi Ted, I don't see how one can automate the distinction between a machine
>> that is down because it crashed and a machine that is down because it hasn't
>> started yet. Assuming that we are logging the machine unavailability as we
>> are doing currently, one can always look at the timestamp of the warning and
>> remember that this is the time the machines were bootstrapping.
>> Consequently, I don't really see the point of reducing the number of
>> warnings, unless the warnings are really polluting the logs. I typically
>> don't see so many that prevents me from reading the rest, but you may have a
>> different perception. Also, recall that we back off, so the warnings become
>> less frequent over time.
>>
>> I'm open to ideas, though. If you see anything wrong in my rationale or if
>> you have an idea of how to do it differently, then I'd be happy to hear.
>> However, if the idea is simply to add a parameter that configures the time
>> for leader election to start, then I'm currently not in favor.
>>
>> -Flavio
>>
>> On Aug 18, 2011, at 5:39 PM, Ted Dunning wrote:
>>
>> Flavio,
>>
>> What you say is correct, but the original poster does have a point that
>> many
>> of these warnings are to be expected and there is a heuristic that might
>> assist in distinguishing some of these cases so that false alarms in the
>> logs could be decreased.
>>
>> That doesn't seem like a big deal to me, but different people have
>> different
>> itches.  In my experience, restarting a ZK cluster from zero almost never
>> happens.
>>
>> On Thu, Aug 18, 2011 at 8:36 AM, Ted Dunning <te...@gmail.com>
>> wrote:
>>
>>
>>
>> On Thu, Aug 18, 2011 at 12:15 AM, Sampath Perera <sampath@adroitlogic.com
>> >wrote:
>>
>>
>>
>> Hhmmm, I think this is a bit different isn't it? Here we know that the
>>
>> first
>>
>> server to come will be failing to connect to the other as they are not yet
>>
>> up. Anyway our real issue is the warning.
>>
>>
>>
>> We know that.
>>
>>
>> But how does the server know that it is the first server?  That is the
>>
>> whole point of the leader election.  You might just have a server
>> rejoining
>>
>> a cluster.  Or you might have a cluster that has been turned off.  Or a
>>
>> cluster with 2 out of 5 machines off and we tried to touch the other down
>>
>> machine before the others.
>>
>>
>>
>>
>> Would you like to suggest a patch?
>>
>>
>>
>> Of course I do.. will prepare a patch and attach.
>>
>>
>>
>> Great!
>>
>>
>>
>>
>>   *flavio*
>> *junqueira*
>>
>> research scientist
>>
>> fpj@yahoo-inc.com
>> direct +34 93-183-8828
>>
>> avinguda diagonal 177, 8th floor, barcelona, 08018, es
>> phone (408) 349 3300    fax (408) 349 3301
>>
>>
>>
>
> *flavio*
> *junqueira*
>
> research scientist
>
> fpj@yahoo-inc.com
> direct +34 93-183-8828
>
> avinguda diagonal 177, 8th floor, barcelona, 08018, es
> phone (408) 349 3300    fax (408) 349 3301
>
>
>


-- 
Thanks,
Sampath
http://adroitlogic.org

Re: Fast leader election initial delay, is that possible?

Posted by Flavio Junqueira <fp...@yahoo-inc.com>.

Sampath, Do you think something along the lines of what Ted describes  
would work for you?

-Flavio

On Aug 18, 2011, at 7:13 PM, Ted Dunning wrote:

> The thought is that a server would not complain about connection  
> refused or inability to form a quorum during the first (say) twenty  
> seconds of operation.
>
> The thesis is that warnings from these causes during that time are  
> spurious.
>
> As I mentioned, I don't see this as urgent or even necessarily a  
> good idea.  I completely reboot a ZK cluster once every year or  
> three.  When I am doing a rolling upgrade, I *want* to see alerts  
> when I bounce a machine.  If I don't want to see those alerts, my  
> monitoring system allows me to put a machine into maintenance mode  
> for a short period of time to temporarily suppress the warnings.
>
> All I was doing was translating and elaborating the original  
> poster's suggestion, not so much endorsing it.
>
> On Thu, Aug 18, 2011 at 8:54 AM, Flavio Junqueira <fpj@yahoo- 
> inc.com> wrote:
> Hi Ted, I don't see how one can automate the distinction between a  
> machine that is down because it crashed and a machine that is down  
> because it hasn't started yet. Assuming that we are logging the  
> machine unavailability as we are doing currently, one can always  
> look at the timestamp of the warning and remember that this is the  
> time the machines were bootstrapping. Consequently, I don't really  
> see the point of reducing the number of warnings, unless the  
> warnings are really polluting the logs. I typically don't see so  
> many that prevents me from reading the rest, but you may have a  
> different perception. Also, recall that we back off, so the warnings  
> become less frequent over time.
>
> I'm open to ideas, though. If you see anything wrong in my rationale  
> or if you have an idea of how to do it differently, then I'd be  
> happy to hear. However, if the idea is simply to add a parameter  
> that configures the time for leader election to start, then I'm  
> currently not in favor.
>
> -Flavio
>
> On Aug 18, 2011, at 5:39 PM, Ted Dunning wrote:
>
>> Flavio,
>>
>> What you say is correct, but the original poster does have a point  
>> that many
>> of these warnings are to be expected and there is a heuristic that  
>> might
>> assist in distinguishing some of these cases so that false alarms  
>> in the
>> logs could be decreased.
>>
>> That doesn't seem like a big deal to me, but different people have  
>> different
>> itches.  In my experience, restarting a ZK cluster from zero almost  
>> never
>> happens.
>>
>> On Thu, Aug 18, 2011 at 8:36 AM, Ted Dunning  
>> <te...@gmail.com> wrote:
>>
>>>
>>>
>>> On Thu, Aug 18, 2011 at 12:15 AM, Sampath Perera <sampath@adroitlogic.com 
>>> >wrote:
>>>
>>>>
>>>> Hhmmm, I think this is a bit different isn't it? Here we know  
>>>> that the
>>>> first
>>>> server to come will be failing to connect to the other as they  
>>>> are not yet
>>>> up. Anyway our real issue is the warning.
>>>>
>>>
>>> We know that.
>>>
>>> But how does the server know that it is the first server?  That is  
>>> the
>>> whole point of the leader election.  You might just have a server  
>>> rejoining
>>> a cluster.  Or you might have a cluster that has been turned off.   
>>> Or a
>>> cluster with 2 out of 5 machines off and we tried to touch the  
>>> other down
>>> machine before the others.
>>>
>>>
>>>>>
>>>>> Would you like to suggest a patch?
>>>>>
>>>>
>>>> Of course I do.. will prepare a patch and attach.
>>>>
>>>
>>> Great!
>>>
>>>
>
> flavio
> junqueira
>
> research scientist
>
> fpj@yahoo-inc.com
> direct +34 93-183-8828
>
> avinguda diagonal 177, 8th floor, barcelona, 08018, es
> phone (408) 349 3300    fax (408) 349 3301
>
>
>
>

flavio
junqueira

research scientist

fpj@yahoo-inc.com
direct +34 93-183-8828

avinguda diagonal 177, 8th floor, barcelona, 08018, es
phone (408) 349 3300    fax (408) 349 3301

Re: Fast leader election initial delay, is that possible?

Posted by Ted Dunning <te...@gmail.com>.

The thought is that a server would not complain about connection refused or
inability to form a quorum during the first (say) twenty seconds of
operation.

The thesis is that warnings from these causes during that time are spurious.

As I mentioned, I don't see this as urgent or even necessarily a good idea.
 I completely reboot a ZK cluster once every year or three.  When I am doing
a rolling upgrade, I *want* to see alerts when I bounce a machine.  If I
don't want to see those alerts, my monitoring system allows me to put a
machine into maintenance mode for a short period of time to temporarily
suppress the warnings.

All I was doing was translating and elaborating the original poster's
suggestion, not so much endorsing it.

On Thu, Aug 18, 2011 at 8:54 AM, Flavio Junqueira <fp...@yahoo-inc.com> wrote:

> Hi Ted, I don't see how one can automate the distinction between a machine
> that is down because it crashed and a machine that is down because it hasn't
> started yet. Assuming that we are logging the machine unavailability as we
> are doing currently, one can always look at the timestamp of the warning and
> remember that this is the time the machines were bootstrapping.
> Consequently, I don't really see the point of reducing the number of
> warnings, unless the warnings are really polluting the logs. I typically
> don't see so many that prevents me from reading the rest, but you may have a
> different perception. Also, recall that we back off, so the warnings become
> less frequent over time.
>
> I'm open to ideas, though. If you see anything wrong in my rationale or if
> you have an idea of how to do it differently, then I'd be happy to hear.
> However, if the idea is simply to add a parameter that configures the time
> for leader election to start, then I'm currently not in favor.
>
> -Flavio
>
> On Aug 18, 2011, at 5:39 PM, Ted Dunning wrote:
>
> Flavio,
>
> What you say is correct, but the original poster does have a point that
> many
> of these warnings are to be expected and there is a heuristic that might
> assist in distinguishing some of these cases so that false alarms in the
> logs could be decreased.
>
> That doesn't seem like a big deal to me, but different people have
> different
> itches.  In my experience, restarting a ZK cluster from zero almost never
> happens.
>
> On Thu, Aug 18, 2011 at 8:36 AM, Ted Dunning <te...@gmail.com>
> wrote:
>
>
>
> On Thu, Aug 18, 2011 at 12:15 AM, Sampath Perera <sampath@adroitlogic.com
> >wrote:
>
>
>
> Hhmmm, I think this is a bit different isn't it? Here we know that the
>
> first
>
> server to come will be failing to connect to the other as they are not yet
>
> up. Anyway our real issue is the warning.
>
>
>
> We know that.
>
>
> But how does the server know that it is the first server?  That is the
>
> whole point of the leader election.  You might just have a server rejoining
>
> a cluster.  Or you might have a cluster that has been turned off.  Or a
>
> cluster with 2 out of 5 machines off and we tried to touch the other down
>
> machine before the others.
>
>
>
>
> Would you like to suggest a patch?
>
>
>
> Of course I do.. will prepare a patch and attach.
>
>
>
> Great!
>
>
>
>
>   *flavio*
> *junqueira*
>
> research scientist
>
> fpj@yahoo-inc.com
> direct +34 93-183-8828
>
> avinguda diagonal 177, 8th floor, barcelona, 08018, es
> phone (408) 349 3300    fax (408) 349 3301
>
>
>

Re: Fast leader election initial delay, is that possible?

Posted by Sampath Perera <sa...@adroitlogic.com>.

s/one of customer/one of our customer

sorry for the typo.

On Thu, Aug 18, 2011 at 10:24 PM, Sampath Perera <sa...@adroitlogic.com>wrote:

> Hi Flavio,
>
> On Thu, Aug 18, 2011 at 9:24 PM, Flavio Junqueira <fp...@yahoo-inc.com>wrote:
>
>> Hi Ted, I don't see how one can automate the distinction between a machine
>> that is down because it crashed and a machine that is down because it hasn't
>> started yet. Assuming that we are logging the machine unavailability as we
>> are doing currently, one can always look at the timestamp of the warning and
>> remember that this is the time the machines were bootstrapping.
>> Consequently, I don't really see the point of reducing the number of
>> warnings, unless the warnings are really polluting the logs. I typically
>> don't see so many that prevents me from reading the rest, but you may have a
>> different perception. Also, recall that we back off, so the warnings become
>> less frequent over time.
>>
>
> True, but one of customer deployments have a log analyzing tool and sends
> notifications for the errors on the log, as you previously said we cannot
> get an optimal value for this timeout, but we can come up with a sub optimal
> value to get rid of this warning.
>
>
>>
>> I'm open to ideas, though. If you see anything wrong in my rationale or if
>> you have an idea of how to do it differently, then I'd be happy to hear.
>> However, if the idea is simply to add a parameter that configures the time
>> for leader election to start, then I'm currently not in favor.
>>
>
> Well, what I was originally looking for was to delay the leader election,
> but as pointed out by Ted, I was going to provide a path on printing this
> warning. (If you carefully look at Ted's comment, and my response,  was
> thinking of a timeout for the warning to be considered as a warning to be
> printed on the log... at least that is what I got from Ted's first comment).
> What do you think about that?
>
>
>>
>> -Flavio
>>
>> On Aug 18, 2011, at 5:39 PM, Ted Dunning wrote:
>>
>> Flavio,
>>
>> What you say is correct, but the original poster does have a point that
>> many
>> of these warnings are to be expected and there is a heuristic that might
>> assist in distinguishing some of these cases so that false alarms in the
>> logs could be decreased.
>>
>> That doesn't seem like a big deal to me, but different people have
>> different
>> itches.  In my experience, restarting a ZK cluster from zero almost never
>> happens.
>>
>> On Thu, Aug 18, 2011 at 8:36 AM, Ted Dunning <te...@gmail.com>
>> wrote:
>>
>>
>>
>> On Thu, Aug 18, 2011 at 12:15 AM, Sampath Perera <sampath@adroitlogic.com
>> >wrote:
>>
>>
>>
>> Hhmmm, I think this is a bit different isn't it? Here we know that the
>>
>> first
>>
>> server to come will be failing to connect to the other as they are not yet
>>
>> up. Anyway our real issue is the warning.
>>
>>
>>
>> We know that.
>>
>>
>> But how does the server know that it is the first server?  That is the
>>
>> whole point of the leader election.  You might just have a server
>> rejoining
>>
>> a cluster.  Or you might have a cluster that has been turned off.  Or a
>>
>> cluster with 2 out of 5 machines off and we tried to touch the other down
>>
>> machine before the others.
>>
>>
>>
>>
>> Would you like to suggest a patch?
>>
>>
>>
>> Of course I do.. will prepare a patch and attach.
>>
>>
>>
>> Great!
>>
>>
>>
>>
>>   *flavio*
>> *junqueira*
>>
>> research scientist
>>
>> fpj@yahoo-inc.com
>> direct +34 93-183-8828
>>
>> avinguda diagonal 177, 8th floor, barcelona, 08018, es
>> phone (408) 349 3300    fax (408) 349 3301
>>
>>
>>
>
>
> --
> Thanks,
> Sampath
> http://adroitlogic.org
>
>


-- 
Thanks,
Sampath
http://adroitlogic.org

Re: Fast leader election initial delay, is that possible?

Posted by Sampath Perera <sa...@adroitlogic.com>.

Hi Flavio,

On Thu, Aug 18, 2011 at 9:24 PM, Flavio Junqueira <fp...@yahoo-inc.com> wrote:

> Hi Ted, I don't see how one can automate the distinction between a machine
> that is down because it crashed and a machine that is down because it hasn't
> started yet. Assuming that we are logging the machine unavailability as we
> are doing currently, one can always look at the timestamp of the warning and
> remember that this is the time the machines were bootstrapping.
> Consequently, I don't really see the point of reducing the number of
> warnings, unless the warnings are really polluting the logs. I typically
> don't see so many that prevents me from reading the rest, but you may have a
> different perception. Also, recall that we back off, so the warnings become
> less frequent over time.
>

True, but one of customer deployments have a log analyzing tool and sends
notifications for the errors on the log, as you previously said we cannot
get an optimal value for this timeout, but we can come up with a sub optimal
value to get rid of this warning.


>
> I'm open to ideas, though. If you see anything wrong in my rationale or if
> you have an idea of how to do it differently, then I'd be happy to hear.
> However, if the idea is simply to add a parameter that configures the time
> for leader election to start, then I'm currently not in favor.
>

Well, what I was originally looking for was to delay the leader election,
but as pointed out by Ted, I was going to provide a path on printing this
warning. (If you carefully look at Ted's comment, and my response,  was
thinking of a timeout for the warning to be considered as a warning to be
printed on the log... at least that is what I got from Ted's first comment).
What do you think about that?


>
> -Flavio
>
> On Aug 18, 2011, at 5:39 PM, Ted Dunning wrote:
>
> Flavio,
>
> What you say is correct, but the original poster does have a point that
> many
> of these warnings are to be expected and there is a heuristic that might
> assist in distinguishing some of these cases so that false alarms in the
> logs could be decreased.
>
> That doesn't seem like a big deal to me, but different people have
> different
> itches.  In my experience, restarting a ZK cluster from zero almost never
> happens.
>
> On Thu, Aug 18, 2011 at 8:36 AM, Ted Dunning <te...@gmail.com>
> wrote:
>
>
>
> On Thu, Aug 18, 2011 at 12:15 AM, Sampath Perera <sampath@adroitlogic.com
> >wrote:
>
>
>
> Hhmmm, I think this is a bit different isn't it? Here we know that the
>
> first
>
> server to come will be failing to connect to the other as they are not yet
>
> up. Anyway our real issue is the warning.
>
>
>
> We know that.
>
>
> But how does the server know that it is the first server?  That is the
>
> whole point of the leader election.  You might just have a server rejoining
>
> a cluster.  Or you might have a cluster that has been turned off.  Or a
>
> cluster with 2 out of 5 machines off and we tried to touch the other down
>
> machine before the others.
>
>
>
>
> Would you like to suggest a patch?
>
>
>
> Of course I do.. will prepare a patch and attach.
>
>
>
> Great!
>
>
>
>
> *flavio*
> *junqueira*
>
> research scientist
>
> fpj@yahoo-inc.com
> direct +34 93-183-8828
>
> avinguda diagonal 177, 8th floor, barcelona, 08018, es
> phone (408) 349 3300    fax (408) 349 3301
>
>
>


-- 
Thanks,
Sampath
http://adroitlogic.org

Re: Fast leader election initial delay, is that possible?

Posted by Flavio Junqueira <fp...@yahoo-inc.com>.

Hi Ted, I don't see how one can automate the distinction between a  
machine that is down because it crashed and a machine that is down  
because it hasn't started yet. Assuming that we are logging the  
machine unavailability as we are doing currently, one can always look  
at the timestamp of the warning and remember that this is the time the  
machines were bootstrapping. Consequently, I don't really see the  
point of reducing the number of warnings, unless the warnings are  
really polluting the logs. I typically don't see so many that prevents  
me from reading the rest, but you may have a different perception.  
Also, recall that we back off, so the warnings become less frequent  
over time.

I'm open to ideas, though. If you see anything wrong in my rationale  
or if you have an idea of how to do it differently, then I'd be happy  
to hear. However, if the idea is simply to add a parameter that  
configures the time for leader election to start, then I'm currently  
not in favor.

-Flavio

On Aug 18, 2011, at 5:39 PM, Ted Dunning wrote:

> Flavio,
>
> What you say is correct, but the original poster does have a point  
> that many
> of these warnings are to be expected and there is a heuristic that  
> might
> assist in distinguishing some of these cases so that false alarms in  
> the
> logs could be decreased.
>
> That doesn't seem like a big deal to me, but different people have  
> different
> itches.  In my experience, restarting a ZK cluster from zero almost  
> never
> happens.
>
> On Thu, Aug 18, 2011 at 8:36 AM, Ted Dunning <te...@gmail.com>  
> wrote:
>
>>
>>
>> On Thu, Aug 18, 2011 at 12:15 AM, Sampath Perera <sampath@adroitlogic.com 
>> >wrote:
>>
>>>
>>> Hhmmm, I think this is a bit different isn't it? Here we know that  
>>> the
>>> first
>>> server to come will be failing to connect to the other as they are  
>>> not yet
>>> up. Anyway our real issue is the warning.
>>>
>>
>> We know that.
>>
>> But how does the server know that it is the first server?  That is  
>> the
>> whole point of the leader election.  You might just have a server  
>> rejoining
>> a cluster.  Or you might have a cluster that has been turned off.   
>> Or a
>> cluster with 2 out of 5 machines off and we tried to touch the  
>> other down
>> machine before the others.
>>
>>
>>>>
>>>> Would you like to suggest a patch?
>>>>
>>>
>>> Of course I do.. will prepare a patch and attach.
>>>
>>
>> Great!
>>
>>

flavio
junqueira

research scientist

fpj@yahoo-inc.com
direct +34 93-183-8828

avinguda diagonal 177, 8th floor, barcelona, 08018, es
phone (408) 349 3300    fax (408) 349 3301

Re: Fast leader election initial delay, is that possible?

Posted by Ted Dunning <te...@gmail.com>.

Flavio,

What you say is correct, but the original poster does have a point that many
of these warnings are to be expected and there is a heuristic that might
assist in distinguishing some of these cases so that false alarms in the
logs could be decreased.

That doesn't seem like a big deal to me, but different people have different
itches.  In my experience, restarting a ZK cluster from zero almost never
happens.

On Thu, Aug 18, 2011 at 8:36 AM, Ted Dunning <te...@gmail.com> wrote:

>
>
> On Thu, Aug 18, 2011 at 12:15 AM, Sampath Perera <sa...@adroitlogic.com>wrote:
>
>>
>> Hhmmm, I think this is a bit different isn't it? Here we know that the
>> first
>> server to come will be failing to connect to the other as they are not yet
>> up. Anyway our real issue is the warning.
>>
>
> We know that.
>
> But how does the server know that it is the first server?  That is the
> whole point of the leader election.  You might just have a server rejoining
> a cluster.  Or you might have a cluster that has been turned off.  Or a
> cluster with 2 out of 5 machines off and we tried to touch the other down
> machine before the others.
>
>
>> >
>> > Would you like to suggest a patch?
>> >
>>
>> Of course I do.. will prepare a patch and attach.
>>
>
> Great!
>
>

Re: Fast leader election initial delay, is that possible?

Posted by Ted Dunning <te...@gmail.com>.

On Thu, Aug 18, 2011 at 12:15 AM, Sampath Perera <sa...@adroitlogic.com>wrote:

>
> Hhmmm, I think this is a bit different isn't it? Here we know that the
> first
> server to come will be failing to connect to the other as they are not yet
> up. Anyway our real issue is the warning.
>

We know that.

But how does the server know that it is the first server?  That is the whole
point of the leader election.  You might just have a server rejoining a
cluster.  Or you might have a cluster that has been turned off.  Or a
cluster with 2 out of 5 machines off and we tried to touch the other down
machine before the others.

> >
> > Would you like to suggest a patch?
> >
>
> Of course I do.. will prepare a patch and attach.
>

Great!

Re: Fast leader election initial delay, is that possible?

Posted by Sampath Perera <sa...@adroitlogic.com>.

Hi Ted,

Thanks for the quick response.

On Thu, Aug 18, 2011 at 10:49 AM, Ted Dunning <te...@gmail.com> wrote:

> Well, it is exactly the same situation as any other situation where
> connection to another server fails.

Hhmmm, I think this is a bit different isn't it? Here we know that the first
server to come will be failing to connect to the other as they are not yet
up. Anyway our real issue is the warning.

>  There is no need to insert a delay here
> unless you think that there should be a delay before this particular
> warning
> should be considered a warning.
>

That would actually give a fix to the problem that we are seeing.

>
> Would you like to suggest a patch?
>

Of course I do.. will prepare a patch and attach.

>
> On Wed, Aug 17, 2011 at 8:40 PM, Sampath Perera <sampath@adroitlogic.com
> >wrote:
>
> > So my question is whether it is possible to configure an initial delay
> for
> > the FastLeaderElection to be kicked off?
> >
> > The rationale being that it is highly unlikely that all 3 nodes started
> at
> > the same time, even in the case where we try to command the startups at
> the
> > same time, and we could get rid of this stacktrace from the logs, as this
> > will trigger warning on the tools that are monitoring the logs, yet is
> not
> > actually a WARN rather an expected error.
> >
>

-- 
Thanks,
Sampath
http://adroitlogic.org

Re: Fast leader election initial delay, is that possible?

Posted by Ted Dunning <te...@gmail.com>.

Well, it is exactly the same situation as any other situation where
connection to another server fails.  There is no need to insert a delay here
unless you think that there should be a delay before this particular warning
should be considered a warning.

Would you like to suggest a patch?

On Wed, Aug 17, 2011 at 8:40 PM, Sampath Perera <sa...@adroitlogic.com>wrote:

> So my question is whether it is possible to configure an initial delay for
> the FastLeaderElection to be kicked off?
>
> The rationale being that it is highly unlikely that all 3 nodes started at
> the same time, even in the case where we try to command the startups at the
> same time, and we could get rid of this stacktrace from the logs, as this
> will trigger warning on the tools that are monitoring the logs, yet is not
> actually a WARN rather an expected error.
>

Re: Fast leader election initial delay, is that possible?

Posted by Flavio Junqueira <fp...@yahoo-inc.com>.

Hi Sampath, When a server starts it tries to contact the others  
immediately; it backs off if it gets no response.

It is true that it is unlikely that servers will start at the same  
time and you'll get such warnings. However, I don't really see the  
point of setting such a configuration parameter. It is really  
difficult to estimate how much time is sufficient, so most likely  
you'll end up getting the warning anyway if you make an aggressive  
estimate or will wait more than necessary if you make a conservative  
estimate.

-Flavio

On Aug 18, 2011, at 5:40 AM, Sampath Perera wrote:

> Hi,
>
> We have a deployment of a 3 node ZooKeeper quorum. When we get to  
> starting
> the 3 ZooKeeper nodes the first node getting started prints the  
> following
> connection refused exception, which is true as the node 2 and 3 are  
> yet to
> be started. This seems to be because of the FastLeaderElection  
> trying to
> connect to the other nodes specified in the quorum.
>
> So my question is whether it is possible to configure an initial  
> delay for
> the FastLeaderElection to be kicked off?
>
> The rationale being that it is highly unlikely that all 3 nodes  
> started at
> the same time, even in the case where we try to command the startups  
> at the
> same time, and we could get rid of this stacktrace from the logs, as  
> this
> will trigger warning on the tools that are monitoring the logs, yet  
> is not
> actually a WARN rather an expected error.
>
> 2011-08-18 08:53:15,530 [-] [WorkerSender Thread]  WARN  
> QuorumCnxManager
> Cannot open channel to 2 at election address localhost/127.0.0.1:3888
> java.net.ConnectException: Connection refused
>    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>    at
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
>    at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:100)
>    at
> org 
> .apache 
> .zookeeper 
> .server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:371)
>    at
> org 
> .apache 
> .zookeeper 
> .server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:340)
>    at
> org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger 
> $WorkerSender.process(FastLeaderElection.java:360)
>    at
> org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger 
> $WorkerSender.run(FastLeaderElection.java:333)
>    at java.lang.Thread.run(Thread.java:662)
> 2011-08-18 08:53:15,532 [-] [WorkerSender Thread]  WARN  
> QuorumCnxManager
> Cannot open channel to 3 at election address localhost/127.0.0.1:3889
> java.net.ConnectException: Connection refused
>    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>    at
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
>    at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:100)
>    at
> org 
> .apache 
> .zookeeper 
> .server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:371)
>    at
> org 
> .apache 
> .zookeeper 
> .server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:340)
>    at
> org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger 
> $WorkerSender.process(FastLeaderElection.java:360)
>    at
> org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger 
> $WorkerSender.run(FastLeaderElection.java:333)
>    at java.lang.Thread.run(Thread.java:662)
>
> -- 
> Thanks,
> Sampath
> http://adroitlogic.org

flavio
junqueira

research scientist

fpj@yahoo-inc.com
direct +34 93-183-8828

avinguda diagonal 177, 8th floor, barcelona, 08018, es
phone (408) 349 3300    fax (408) 349 3301