You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@zookeeper.apache.org by Anand Parthasarathy <an...@avinetworks.com> on 2016/10/08 01:11:36 UTC

Zookeeper leader election takes a long time.

Hi,

We are currently using zookeeper 3.4.6 version and use a 3 node solution in
our system. We see that occasionally, when a node is powered off (in this
instance, it was actually a leader node), the remaining two nodes do not
form a quorum for a really long time. Looking at the logs, it appears the
sequence is as follows:
- Node 2 is the zookeeper leader
- Node 2 is powered off
- Node 1 and Node 3 recognize and start the election
- Node 3 times out after initLimit * tickTime with "Timeout while waiting
for quorum" for Round N
- Node 1 times out after initLimit * tickTime with "Exception while trying
to follow leader" for Round N+1 at the same time.
- And the process continues where N is sequentially incrementing.
- This happens for a long time.
- In one instance, we used tickTime=5000 and initLimit=20 and it took
around 3.5 hours to converge.
- In a given round, Node 1 will try connecting to Node 2, gets connection
refused waits for notification timeout which increases by 2 every iteration
until it hits the initLimit. Connection Refused is because the node 2 comes
up after reboot, but zookeeper process is not started (due to a different
failure).

It looks similar to ZOOKEEPER-2164 but there it is a connection timeout
where Node 2 is not reachable.

Could you pls. share if you have seen this issue and if so, what is the
workaround that can be employed in 3.4.6.

Thanks,
Anand.

Re: Zookeeper leader election takes a long time.

Posted by Michael Han <ha...@cloudera.com>.

>> It looks similar to ZOOKEEPER-2164 but there it is a connection timeout where
Node 2 is not reachable.

It sounds to me that issue described in this email is the same as
ZOOKEEPER-2164 after checking the attached log of node 1 and node 3. Let's
check a specific time window for both logs, for example starting at
2016-10-03 13:58:40, what happened:

- n1 and n3 exchanged notifications and n3 is elected as leader.
- n1 tries to connect to n2 and failed and retried a couple of time.
- During this period, n1 can't connect to n3 because of n1->n2 connection
attempts.
- n3 aborts and back to looking state. And this repeats.

There is a patch for this
https://issues.apache.org/jira/browse/ZOOKEEPER-900 - might worth to try
the patch ?


On Thu, Oct 13, 2016 at 10:16 AM, Anand Parthasarathy <
anpartha@avinetworks.com> wrote:

> Just wanted to let you know that at this time, one of the node is powered
> off and the other two nodes took more than 10 minutes to converge. Our
> script exits and so, we don't know when it exactly converged. Normally, it
> takes < 100 seconds to converge.
>
> Thanks,
> Anand.
>
> On Thu, Oct 13, 2016 at 10:09 AM, Anand Parthasarathy <
> anpartha@avinetworks.com> wrote:
>
> > Hi Michael,
> >
> > We have reproduced this issue on a private AWS setup that has public IP
> > access. I will send you the details of the instance IP and the
> credentials
> > separately. If it needs to be shared with more people, I am happy to
> share
> > with them as well.
> >
> > Thanks
> > Anand.
> >
> > On Tue, Oct 11, 2016 at 3:46 PM, Michael Han <ha...@cloudera.com> wrote:
> >
> >> Hi Anand,
> >>
> >> >> We have isolated it to a test setup, where we are able
> >> to reproduce this somewhat consistently if we keep a node powered off.
> >>
> >> Do you mind share your setup / steps to reproduce if the setup only
> >> involves ZooKeeper without other dependencies?
> >>
> >>
> >> On Tue, Oct 11, 2016 at 2:56 PM, Anand Parthasarathy <
> >> anpartha@avinetworks.com> wrote:
> >>
> >> > Folks,
> >> >
> >> > Sending a quick note again to find out if there is any insight the
> >> > community can offer in terms of a solution or workaround? We use
> >> zookeeper
> >> > for service discovery in our product and this issue has surfaced in a
> >> large
> >> > customer site a couple of times and we need to figure out a solution
> >> soon.
> >> >
> >> > Thanks,
> >> > Anand.
> >> >
> >> > On Mon, Oct 10, 2016 at 10:15 AM, Anand Parthasarathy <
> >> > anpartha@avinetworks.com> wrote:
> >> >
> >> > > Folks,
> >> > >
> >> > > Any insight into this or any workarounds that you can think of to
> >> > mitigate
> >> > > against this issue? We have isolated it to a test setup, where we
> are
> >> > able
> >> > > to reproduce this somewhat consistently if we keep a node powered
> off.
> >> > >
> >> > > Thanks,
> >> > > Anand.
> >> > >
> >> > > On Sat, Oct 8, 2016 at 10:05 AM, Anand Parthasarathy <
> >> > > anpartha@avinetworks.com> wrote:
> >> > >
> >> > >> Hi Flavio,
> >> > >>
> >> > >> I have attached the logs from node 1 and node 3. Node 2 was powered
> >> off
> >> > >> around 10-03 12:36. Leader election kept going until 10-03 15:57:16
> >> > when it
> >> > >> finally converged.
> >> > >>
> >> > >> Thanks,
> >> > >> Anand.
> >> > >>
> >> > >> On Sat, Oct 8, 2016 at 7:55 AM, Flavio Junqueira <fp...@apache.org>
> >> > wrote:
> >> > >>
> >> > >>> Hi Anand,
> >> > >>>
> >> > >>> I don't understand whether 1 and 3 were able or even trying to
> >> connect
> >> > >>> to each other. They should be able to elect a leader between them
> >> and
> >> > make
> >> > >>> progress. You might want to upload logs and let us know.
> >> > >>>
> >> > >>> -Flavio
> >> > >>>
> >> > >>> > On 08 Oct 2016, at 02:11, Anand Parthasarathy <
> >> > >>> anpartha@avinetworks.com> wrote:
> >> > >>> >
> >> > >>> > Hi,
> >> > >>> >
> >> > >>> > We are currently using zookeeper 3.4.6 version and use a 3 node
> >> > >>> solution in
> >> > >>> > our system. We see that occasionally, when a node is powered off
> >> (in
> >> > >>> this
> >> > >>> > instance, it was actually a leader node), the remaining two
> nodes
> >> do
> >> > >>> not
> >> > >>> > form a quorum for a really long time. Looking at the logs, it
> >> appears
> >> > >>> the
> >> > >>> > sequence is as follows:
> >> > >>> > - Node 2 is the zookeeper leader
> >> > >>> > - Node 2 is powered off
> >> > >>> > - Node 1 and Node 3 recognize and start the election
> >> > >>> > - Node 3 times out after initLimit * tickTime with "Timeout
> while
> >> > >>> waiting
> >> > >>> > for quorum" for Round N
> >> > >>> > - Node 1 times out after initLimit * tickTime with "Exception
> >> while
> >> > >>> trying
> >> > >>> > to follow leader" for Round N+1 at the same time.
> >> > >>> > - And the process continues where N is sequentially
> incrementing.
> >> > >>> > - This happens for a long time.
> >> > >>> > - In one instance, we used tickTime=5000 and initLimit=20 and it
> >> took
> >> > >>> > around 3.5 hours to converge.
> >> > >>> > - In a given round, Node 1 will try connecting to Node 2, gets
> >> > >>> connection
> >> > >>> > refused waits for notification timeout which increases by 2
> every
> >> > >>> iteration
> >> > >>> > until it hits the initLimit. Connection Refused is because the
> >> node 2
> >> > >>> comes
> >> > >>> > up after reboot, but zookeeper process is not started (due to a
> >> > >>> different
> >> > >>> > failure).
> >> > >>> >
> >> > >>> > It looks similar to ZOOKEEPER-2164 but there it is a connection
> >> > timeout
> >> > >>> > where Node 2 is not reachable.
> >> > >>> >
> >> > >>> > Could you pls. share if you have seen this issue and if so, what
> >> is
> >> > the
> >> > >>> > workaround that can be employed in 3.4.6.
> >> > >>> >
> >> > >>> > Thanks,
> >> > >>> > Anand.
> >> > >>>
> >> > >>>
> >> > >>
> >> > >
> >> >
> >>
> >>
> >>
> >> --
> >> Cheers
> >> Michael.
> >>
> >
> >
>



-- 
Cheers
Michael.

Re: Zookeeper leader election takes a long time.

Posted by Anand Parthasarathy <an...@avinetworks.com>.

Just wanted to let you know that at this time, one of the node is powered
off and the other two nodes took more than 10 minutes to converge. Our
script exits and so, we don't know when it exactly converged. Normally, it
takes < 100 seconds to converge.

Thanks,
Anand.

On Thu, Oct 13, 2016 at 10:09 AM, Anand Parthasarathy <
anpartha@avinetworks.com> wrote:

> Hi Michael,
>
> We have reproduced this issue on a private AWS setup that has public IP
> access. I will send you the details of the instance IP and the credentials
> separately. If it needs to be shared with more people, I am happy to share
> with them as well.
>
> Thanks
> Anand.
>
> On Tue, Oct 11, 2016 at 3:46 PM, Michael Han <ha...@cloudera.com> wrote:
>
>> Hi Anand,
>>
>> >> We have isolated it to a test setup, where we are able
>> to reproduce this somewhat consistently if we keep a node powered off.
>>
>> Do you mind share your setup / steps to reproduce if the setup only
>> involves ZooKeeper without other dependencies?
>>
>>
>> On Tue, Oct 11, 2016 at 2:56 PM, Anand Parthasarathy <
>> anpartha@avinetworks.com> wrote:
>>
>> > Folks,
>> >
>> > Sending a quick note again to find out if there is any insight the
>> > community can offer in terms of a solution or workaround? We use
>> zookeeper
>> > for service discovery in our product and this issue has surfaced in a
>> large
>> > customer site a couple of times and we need to figure out a solution
>> soon.
>> >
>> > Thanks,
>> > Anand.
>> >
>> > On Mon, Oct 10, 2016 at 10:15 AM, Anand Parthasarathy <
>> > anpartha@avinetworks.com> wrote:
>> >
>> > > Folks,
>> > >
>> > > Any insight into this or any workarounds that you can think of to
>> > mitigate
>> > > against this issue? We have isolated it to a test setup, where we are
>> > able
>> > > to reproduce this somewhat consistently if we keep a node powered off.
>> > >
>> > > Thanks,
>> > > Anand.
>> > >
>> > > On Sat, Oct 8, 2016 at 10:05 AM, Anand Parthasarathy <
>> > > anpartha@avinetworks.com> wrote:
>> > >
>> > >> Hi Flavio,
>> > >>
>> > >> I have attached the logs from node 1 and node 3. Node 2 was powered
>> off
>> > >> around 10-03 12:36. Leader election kept going until 10-03 15:57:16
>> > when it
>> > >> finally converged.
>> > >>
>> > >> Thanks,
>> > >> Anand.
>> > >>
>> > >> On Sat, Oct 8, 2016 at 7:55 AM, Flavio Junqueira <fp...@apache.org>
>> > wrote:
>> > >>
>> > >>> Hi Anand,
>> > >>>
>> > >>> I don't understand whether 1 and 3 were able or even trying to
>> connect
>> > >>> to each other. They should be able to elect a leader between them
>> and
>> > make
>> > >>> progress. You might want to upload logs and let us know.
>> > >>>
>> > >>> -Flavio
>> > >>>
>> > >>> > On 08 Oct 2016, at 02:11, Anand Parthasarathy <
>> > >>> anpartha@avinetworks.com> wrote:
>> > >>> >
>> > >>> > Hi,
>> > >>> >
>> > >>> > We are currently using zookeeper 3.4.6 version and use a 3 node
>> > >>> solution in
>> > >>> > our system. We see that occasionally, when a node is powered off
>> (in
>> > >>> this
>> > >>> > instance, it was actually a leader node), the remaining two nodes
>> do
>> > >>> not
>> > >>> > form a quorum for a really long time. Looking at the logs, it
>> appears
>> > >>> the
>> > >>> > sequence is as follows:
>> > >>> > - Node 2 is the zookeeper leader
>> > >>> > - Node 2 is powered off
>> > >>> > - Node 1 and Node 3 recognize and start the election
>> > >>> > - Node 3 times out after initLimit * tickTime with "Timeout while
>> > >>> waiting
>> > >>> > for quorum" for Round N
>> > >>> > - Node 1 times out after initLimit * tickTime with "Exception
>> while
>> > >>> trying
>> > >>> > to follow leader" for Round N+1 at the same time.
>> > >>> > - And the process continues where N is sequentially incrementing.
>> > >>> > - This happens for a long time.
>> > >>> > - In one instance, we used tickTime=5000 and initLimit=20 and it
>> took
>> > >>> > around 3.5 hours to converge.
>> > >>> > - In a given round, Node 1 will try connecting to Node 2, gets
>> > >>> connection
>> > >>> > refused waits for notification timeout which increases by 2 every
>> > >>> iteration
>> > >>> > until it hits the initLimit. Connection Refused is because the
>> node 2
>> > >>> comes
>> > >>> > up after reboot, but zookeeper process is not started (due to a
>> > >>> different
>> > >>> > failure).
>> > >>> >
>> > >>> > It looks similar to ZOOKEEPER-2164 but there it is a connection
>> > timeout
>> > >>> > where Node 2 is not reachable.
>> > >>> >
>> > >>> > Could you pls. share if you have seen this issue and if so, what
>> is
>> > the
>> > >>> > workaround that can be employed in 3.4.6.
>> > >>> >
>> > >>> > Thanks,
>> > >>> > Anand.
>> > >>>
>> > >>>
>> > >>
>> > >
>> >
>>
>>
>>
>> --
>> Cheers
>> Michael.
>>
>
>

Re: Zookeeper leader election takes a long time.

Posted by Anand Parthasarathy <an...@avinetworks.com>.

Hi Michael,

We have reproduced this issue on a private AWS setup that has public IP
access. I will send you the details of the instance IP and the credentials
separately. If it needs to be shared with more people, I am happy to share
with them as well.

Thanks
Anand.

On Tue, Oct 11, 2016 at 3:46 PM, Michael Han <ha...@cloudera.com> wrote:

> Hi Anand,
>
> >> We have isolated it to a test setup, where we are able
> to reproduce this somewhat consistently if we keep a node powered off.
>
> Do you mind share your setup / steps to reproduce if the setup only
> involves ZooKeeper without other dependencies?
>
>
> On Tue, Oct 11, 2016 at 2:56 PM, Anand Parthasarathy <
> anpartha@avinetworks.com> wrote:
>
> > Folks,
> >
> > Sending a quick note again to find out if there is any insight the
> > community can offer in terms of a solution or workaround? We use
> zookeeper
> > for service discovery in our product and this issue has surfaced in a
> large
> > customer site a couple of times and we need to figure out a solution
> soon.
> >
> > Thanks,
> > Anand.
> >
> > On Mon, Oct 10, 2016 at 10:15 AM, Anand Parthasarathy <
> > anpartha@avinetworks.com> wrote:
> >
> > > Folks,
> > >
> > > Any insight into this or any workarounds that you can think of to
> > mitigate
> > > against this issue? We have isolated it to a test setup, where we are
> > able
> > > to reproduce this somewhat consistently if we keep a node powered off.
> > >
> > > Thanks,
> > > Anand.
> > >
> > > On Sat, Oct 8, 2016 at 10:05 AM, Anand Parthasarathy <
> > > anpartha@avinetworks.com> wrote:
> > >
> > >> Hi Flavio,
> > >>
> > >> I have attached the logs from node 1 and node 3. Node 2 was powered
> off
> > >> around 10-03 12:36. Leader election kept going until 10-03 15:57:16
> > when it
> > >> finally converged.
> > >>
> > >> Thanks,
> > >> Anand.
> > >>
> > >> On Sat, Oct 8, 2016 at 7:55 AM, Flavio Junqueira <fp...@apache.org>
> > wrote:
> > >>
> > >>> Hi Anand,
> > >>>
> > >>> I don't understand whether 1 and 3 were able or even trying to
> connect
> > >>> to each other. They should be able to elect a leader between them and
> > make
> > >>> progress. You might want to upload logs and let us know.
> > >>>
> > >>> -Flavio
> > >>>
> > >>> > On 08 Oct 2016, at 02:11, Anand Parthasarathy <
> > >>> anpartha@avinetworks.com> wrote:
> > >>> >
> > >>> > Hi,
> > >>> >
> > >>> > We are currently using zookeeper 3.4.6 version and use a 3 node
> > >>> solution in
> > >>> > our system. We see that occasionally, when a node is powered off
> (in
> > >>> this
> > >>> > instance, it was actually a leader node), the remaining two nodes
> do
> > >>> not
> > >>> > form a quorum for a really long time. Looking at the logs, it
> appears
> > >>> the
> > >>> > sequence is as follows:
> > >>> > - Node 2 is the zookeeper leader
> > >>> > - Node 2 is powered off
> > >>> > - Node 1 and Node 3 recognize and start the election
> > >>> > - Node 3 times out after initLimit * tickTime with "Timeout while
> > >>> waiting
> > >>> > for quorum" for Round N
> > >>> > - Node 1 times out after initLimit * tickTime with "Exception while
> > >>> trying
> > >>> > to follow leader" for Round N+1 at the same time.
> > >>> > - And the process continues where N is sequentially incrementing.
> > >>> > - This happens for a long time.
> > >>> > - In one instance, we used tickTime=5000 and initLimit=20 and it
> took
> > >>> > around 3.5 hours to converge.
> > >>> > - In a given round, Node 1 will try connecting to Node 2, gets
> > >>> connection
> > >>> > refused waits for notification timeout which increases by 2 every
> > >>> iteration
> > >>> > until it hits the initLimit. Connection Refused is because the
> node 2
> > >>> comes
> > >>> > up after reboot, but zookeeper process is not started (due to a
> > >>> different
> > >>> > failure).
> > >>> >
> > >>> > It looks similar to ZOOKEEPER-2164 but there it is a connection
> > timeout
> > >>> > where Node 2 is not reachable.
> > >>> >
> > >>> > Could you pls. share if you have seen this issue and if so, what is
> > the
> > >>> > workaround that can be employed in 3.4.6.
> > >>> >
> > >>> > Thanks,
> > >>> > Anand.
> > >>>
> > >>>
> > >>
> > >
> >
>
>
>
> --
> Cheers
> Michael.
>

Re: Zookeeper leader election takes a long time.

Posted by Michael Han <ha...@cloudera.com>.

Hi Anand,

>> We have isolated it to a test setup, where we are able
to reproduce this somewhat consistently if we keep a node powered off.

Do you mind share your setup / steps to reproduce if the setup only
involves ZooKeeper without other dependencies?


On Tue, Oct 11, 2016 at 2:56 PM, Anand Parthasarathy <
anpartha@avinetworks.com> wrote:

> Folks,
>
> Sending a quick note again to find out if there is any insight the
> community can offer in terms of a solution or workaround? We use zookeeper
> for service discovery in our product and this issue has surfaced in a large
> customer site a couple of times and we need to figure out a solution soon.
>
> Thanks,
> Anand.
>
> On Mon, Oct 10, 2016 at 10:15 AM, Anand Parthasarathy <
> anpartha@avinetworks.com> wrote:
>
> > Folks,
> >
> > Any insight into this or any workarounds that you can think of to
> mitigate
> > against this issue? We have isolated it to a test setup, where we are
> able
> > to reproduce this somewhat consistently if we keep a node powered off.
> >
> > Thanks,
> > Anand.
> >
> > On Sat, Oct 8, 2016 at 10:05 AM, Anand Parthasarathy <
> > anpartha@avinetworks.com> wrote:
> >
> >> Hi Flavio,
> >>
> >> I have attached the logs from node 1 and node 3. Node 2 was powered off
> >> around 10-03 12:36. Leader election kept going until 10-03 15:57:16
> when it
> >> finally converged.
> >>
> >> Thanks,
> >> Anand.
> >>
> >> On Sat, Oct 8, 2016 at 7:55 AM, Flavio Junqueira <fp...@apache.org>
> wrote:
> >>
> >>> Hi Anand,
> >>>
> >>> I don't understand whether 1 and 3 were able or even trying to connect
> >>> to each other. They should be able to elect a leader between them and
> make
> >>> progress. You might want to upload logs and let us know.
> >>>
> >>> -Flavio
> >>>
> >>> > On 08 Oct 2016, at 02:11, Anand Parthasarathy <
> >>> anpartha@avinetworks.com> wrote:
> >>> >
> >>> > Hi,
> >>> >
> >>> > We are currently using zookeeper 3.4.6 version and use a 3 node
> >>> solution in
> >>> > our system. We see that occasionally, when a node is powered off (in
> >>> this
> >>> > instance, it was actually a leader node), the remaining two nodes do
> >>> not
> >>> > form a quorum for a really long time. Looking at the logs, it appears
> >>> the
> >>> > sequence is as follows:
> >>> > - Node 2 is the zookeeper leader
> >>> > - Node 2 is powered off
> >>> > - Node 1 and Node 3 recognize and start the election
> >>> > - Node 3 times out after initLimit * tickTime with "Timeout while
> >>> waiting
> >>> > for quorum" for Round N
> >>> > - Node 1 times out after initLimit * tickTime with "Exception while
> >>> trying
> >>> > to follow leader" for Round N+1 at the same time.
> >>> > - And the process continues where N is sequentially incrementing.
> >>> > - This happens for a long time.
> >>> > - In one instance, we used tickTime=5000 and initLimit=20 and it took
> >>> > around 3.5 hours to converge.
> >>> > - In a given round, Node 1 will try connecting to Node 2, gets
> >>> connection
> >>> > refused waits for notification timeout which increases by 2 every
> >>> iteration
> >>> > until it hits the initLimit. Connection Refused is because the node 2
> >>> comes
> >>> > up after reboot, but zookeeper process is not started (due to a
> >>> different
> >>> > failure).
> >>> >
> >>> > It looks similar to ZOOKEEPER-2164 but there it is a connection
> timeout
> >>> > where Node 2 is not reachable.
> >>> >
> >>> > Could you pls. share if you have seen this issue and if so, what is
> the
> >>> > workaround that can be employed in 3.4.6.
> >>> >
> >>> > Thanks,
> >>> > Anand.
> >>>
> >>>
> >>
> >
>



-- 
Cheers
Michael.

Re: Zookeeper leader election takes a long time.

Posted by Anand Parthasarathy <an...@avinetworks.com>.

Folks,

Sending a quick note again to find out if there is any insight the
community can offer in terms of a solution or workaround? We use zookeeper
for service discovery in our product and this issue has surfaced in a large
customer site a couple of times and we need to figure out a solution soon.

Thanks,
Anand.

On Mon, Oct 10, 2016 at 10:15 AM, Anand Parthasarathy <
anpartha@avinetworks.com> wrote:

> Folks,
>
> Any insight into this or any workarounds that you can think of to mitigate
> against this issue? We have isolated it to a test setup, where we are able
> to reproduce this somewhat consistently if we keep a node powered off.
>
> Thanks,
> Anand.
>
> On Sat, Oct 8, 2016 at 10:05 AM, Anand Parthasarathy <
> anpartha@avinetworks.com> wrote:
>
>> Hi Flavio,
>>
>> I have attached the logs from node 1 and node 3. Node 2 was powered off
>> around 10-03 12:36. Leader election kept going until 10-03 15:57:16 when it
>> finally converged.
>>
>> Thanks,
>> Anand.
>>
>> On Sat, Oct 8, 2016 at 7:55 AM, Flavio Junqueira <fp...@apache.org> wrote:
>>
>>> Hi Anand,
>>>
>>> I don't understand whether 1 and 3 were able or even trying to connect
>>> to each other. They should be able to elect a leader between them and make
>>> progress. You might want to upload logs and let us know.
>>>
>>> -Flavio
>>>
>>> > On 08 Oct 2016, at 02:11, Anand Parthasarathy <
>>> anpartha@avinetworks.com> wrote:
>>> >
>>> > Hi,
>>> >
>>> > We are currently using zookeeper 3.4.6 version and use a 3 node
>>> solution in
>>> > our system. We see that occasionally, when a node is powered off (in
>>> this
>>> > instance, it was actually a leader node), the remaining two nodes do
>>> not
>>> > form a quorum for a really long time. Looking at the logs, it appears
>>> the
>>> > sequence is as follows:
>>> > - Node 2 is the zookeeper leader
>>> > - Node 2 is powered off
>>> > - Node 1 and Node 3 recognize and start the election
>>> > - Node 3 times out after initLimit * tickTime with "Timeout while
>>> waiting
>>> > for quorum" for Round N
>>> > - Node 1 times out after initLimit * tickTime with "Exception while
>>> trying
>>> > to follow leader" for Round N+1 at the same time.
>>> > - And the process continues where N is sequentially incrementing.
>>> > - This happens for a long time.
>>> > - In one instance, we used tickTime=5000 and initLimit=20 and it took
>>> > around 3.5 hours to converge.
>>> > - In a given round, Node 1 will try connecting to Node 2, gets
>>> connection
>>> > refused waits for notification timeout which increases by 2 every
>>> iteration
>>> > until it hits the initLimit. Connection Refused is because the node 2
>>> comes
>>> > up after reboot, but zookeeper process is not started (due to a
>>> different
>>> > failure).
>>> >
>>> > It looks similar to ZOOKEEPER-2164 but there it is a connection timeout
>>> > where Node 2 is not reachable.
>>> >
>>> > Could you pls. share if you have seen this issue and if so, what is the
>>> > workaround that can be employed in 3.4.6.
>>> >
>>> > Thanks,
>>> > Anand.
>>>
>>>
>>
>

Re: Zookeeper leader election takes a long time.

Posted by Anand Parthasarathy <an...@avinetworks.com>.

Hi Ben,

We have only tried with 3.4.6. We can potentially try with 3.4.8 or 3.4.9.

Thanks,
Anand.

On Tue, Oct 11, 2016 at 5:29 PM, Ben Sherman <be...@gmail.com> wrote:

> Anand, in your lab are you able to replicate this with 3.4.8 or 3.4.9?
>
> On Mon, Oct 10, 2016 at 10:15 AM, Anand Parthasarathy <
> anpartha@avinetworks.com> wrote:
>
> > Folks,
> >
> > Any insight into this or any workarounds that you can think of to
> mitigate
> > against this issue? We have isolated it to a test setup, where we are
> able
> > to reproduce this somewhat consistently if we keep a node powered off.
> >
> > Thanks,
> > Anand.
> >
> > On Sat, Oct 8, 2016 at 10:05 AM, Anand Parthasarathy <
> > anpartha@avinetworks.com> wrote:
> >
> > > Hi Flavio,
> > >
> > > I have attached the logs from node 1 and node 3. Node 2 was powered off
> > > around 10-03 12:36. Leader election kept going until 10-03 15:57:16
> when
> > it
> > > finally converged.
> > >
> > > Thanks,
> > > Anand.
> > >
> > > On Sat, Oct 8, 2016 at 7:55 AM, Flavio Junqueira <fp...@apache.org>
> wrote:
> > >
> > >> Hi Anand,
> > >>
> > >> I don't understand whether 1 and 3 were able or even trying to connect
> > to
> > >> each other. They should be able to elect a leader between them and
> make
> > >> progress. You might want to upload logs and let us know.
> > >>
> > >> -Flavio
> > >>
> > >> > On 08 Oct 2016, at 02:11, Anand Parthasarathy <
> > anpartha@avinetworks.com>
> > >> wrote:
> > >> >
> > >> > Hi,
> > >> >
> > >> > We are currently using zookeeper 3.4.6 version and use a 3 node
> > >> solution in
> > >> > our system. We see that occasionally, when a node is powered off (in
> > >> this
> > >> > instance, it was actually a leader node), the remaining two nodes do
> > not
> > >> > form a quorum for a really long time. Looking at the logs, it
> appears
> > >> the
> > >> > sequence is as follows:
> > >> > - Node 2 is the zookeeper leader
> > >> > - Node 2 is powered off
> > >> > - Node 1 and Node 3 recognize and start the election
> > >> > - Node 3 times out after initLimit * tickTime with "Timeout while
> > >> waiting
> > >> > for quorum" for Round N
> > >> > - Node 1 times out after initLimit * tickTime with "Exception while
> > >> trying
> > >> > to follow leader" for Round N+1 at the same time.
> > >> > - And the process continues where N is sequentially incrementing.
> > >> > - This happens for a long time.
> > >> > - In one instance, we used tickTime=5000 and initLimit=20 and it
> took
> > >> > around 3.5 hours to converge.
> > >> > - In a given round, Node 1 will try connecting to Node 2, gets
> > >> connection
> > >> > refused waits for notification timeout which increases by 2 every
> > >> iteration
> > >> > until it hits the initLimit. Connection Refused is because the node
> 2
> > >> comes
> > >> > up after reboot, but zookeeper process is not started (due to a
> > >> different
> > >> > failure).
> > >> >
> > >> > It looks similar to ZOOKEEPER-2164 but there it is a connection
> > timeout
> > >> > where Node 2 is not reachable.
> > >> >
> > >> > Could you pls. share if you have seen this issue and if so, what is
> > the
> > >> > workaround that can be employed in 3.4.6.
> > >> >
> > >> > Thanks,
> > >> > Anand.
> > >>
> > >>
> > >
> >
>

Re: Zookeeper leader election takes a long time.

Posted by Ben Sherman <be...@gmail.com>.

Anand, in your lab are you able to replicate this with 3.4.8 or 3.4.9?

On Mon, Oct 10, 2016 at 10:15 AM, Anand Parthasarathy <
anpartha@avinetworks.com> wrote:

> Folks,
>
> Any insight into this or any workarounds that you can think of to mitigate
> against this issue? We have isolated it to a test setup, where we are able
> to reproduce this somewhat consistently if we keep a node powered off.
>
> Thanks,
> Anand.
>
> On Sat, Oct 8, 2016 at 10:05 AM, Anand Parthasarathy <
> anpartha@avinetworks.com> wrote:
>
> > Hi Flavio,
> >
> > I have attached the logs from node 1 and node 3. Node 2 was powered off
> > around 10-03 12:36. Leader election kept going until 10-03 15:57:16 when
> it
> > finally converged.
> >
> > Thanks,
> > Anand.
> >
> > On Sat, Oct 8, 2016 at 7:55 AM, Flavio Junqueira <fp...@apache.org> wrote:
> >
> >> Hi Anand,
> >>
> >> I don't understand whether 1 and 3 were able or even trying to connect
> to
> >> each other. They should be able to elect a leader between them and make
> >> progress. You might want to upload logs and let us know.
> >>
> >> -Flavio
> >>
> >> > On 08 Oct 2016, at 02:11, Anand Parthasarathy <
> anpartha@avinetworks.com>
> >> wrote:
> >> >
> >> > Hi,
> >> >
> >> > We are currently using zookeeper 3.4.6 version and use a 3 node
> >> solution in
> >> > our system. We see that occasionally, when a node is powered off (in
> >> this
> >> > instance, it was actually a leader node), the remaining two nodes do
> not
> >> > form a quorum for a really long time. Looking at the logs, it appears
> >> the
> >> > sequence is as follows:
> >> > - Node 2 is the zookeeper leader
> >> > - Node 2 is powered off
> >> > - Node 1 and Node 3 recognize and start the election
> >> > - Node 3 times out after initLimit * tickTime with "Timeout while
> >> waiting
> >> > for quorum" for Round N
> >> > - Node 1 times out after initLimit * tickTime with "Exception while
> >> trying
> >> > to follow leader" for Round N+1 at the same time.
> >> > - And the process continues where N is sequentially incrementing.
> >> > - This happens for a long time.
> >> > - In one instance, we used tickTime=5000 and initLimit=20 and it took
> >> > around 3.5 hours to converge.
> >> > - In a given round, Node 1 will try connecting to Node 2, gets
> >> connection
> >> > refused waits for notification timeout which increases by 2 every
> >> iteration
> >> > until it hits the initLimit. Connection Refused is because the node 2
> >> comes
> >> > up after reboot, but zookeeper process is not started (due to a
> >> different
> >> > failure).
> >> >
> >> > It looks similar to ZOOKEEPER-2164 but there it is a connection
> timeout
> >> > where Node 2 is not reachable.
> >> >
> >> > Could you pls. share if you have seen this issue and if so, what is
> the
> >> > workaround that can be employed in 3.4.6.
> >> >
> >> > Thanks,
> >> > Anand.
> >>
> >>
> >
>

Re: Zookeeper leader election takes a long time.

Posted by Anand Parthasarathy <an...@avinetworks.com>.

Folks,

Any insight into this or any workarounds that you can think of to mitigate
against this issue? We have isolated it to a test setup, where we are able
to reproduce this somewhat consistently if we keep a node powered off.

Thanks,
Anand.

On Sat, Oct 8, 2016 at 10:05 AM, Anand Parthasarathy <
anpartha@avinetworks.com> wrote:

> Hi Flavio,
>
> I have attached the logs from node 1 and node 3. Node 2 was powered off
> around 10-03 12:36. Leader election kept going until 10-03 15:57:16 when it
> finally converged.
>
> Thanks,
> Anand.
>
> On Sat, Oct 8, 2016 at 7:55 AM, Flavio Junqueira <fp...@apache.org> wrote:
>
>> Hi Anand,
>>
>> I don't understand whether 1 and 3 were able or even trying to connect to
>> each other. They should be able to elect a leader between them and make
>> progress. You might want to upload logs and let us know.
>>
>> -Flavio
>>
>> > On 08 Oct 2016, at 02:11, Anand Parthasarathy <an...@avinetworks.com>
>> wrote:
>> >
>> > Hi,
>> >
>> > We are currently using zookeeper 3.4.6 version and use a 3 node
>> solution in
>> > our system. We see that occasionally, when a node is powered off (in
>> this
>> > instance, it was actually a leader node), the remaining two nodes do not
>> > form a quorum for a really long time. Looking at the logs, it appears
>> the
>> > sequence is as follows:
>> > - Node 2 is the zookeeper leader
>> > - Node 2 is powered off
>> > - Node 1 and Node 3 recognize and start the election
>> > - Node 3 times out after initLimit * tickTime with "Timeout while
>> waiting
>> > for quorum" for Round N
>> > - Node 1 times out after initLimit * tickTime with "Exception while
>> trying
>> > to follow leader" for Round N+1 at the same time.
>> > - And the process continues where N is sequentially incrementing.
>> > - This happens for a long time.
>> > - In one instance, we used tickTime=5000 and initLimit=20 and it took
>> > around 3.5 hours to converge.
>> > - In a given round, Node 1 will try connecting to Node 2, gets
>> connection
>> > refused waits for notification timeout which increases by 2 every
>> iteration
>> > until it hits the initLimit. Connection Refused is because the node 2
>> comes
>> > up after reboot, but zookeeper process is not started (due to a
>> different
>> > failure).
>> >
>> > It looks similar to ZOOKEEPER-2164 but there it is a connection timeout
>> > where Node 2 is not reachable.
>> >
>> > Could you pls. share if you have seen this issue and if so, what is the
>> > workaround that can be employed in 3.4.6.
>> >
>> > Thanks,
>> > Anand.
>>
>>
>

Re: Zookeeper leader election takes a long time.

Posted by gnethercutt <gn...@gmail.com>.

Those symptoms and logs are also suspiciously similar to the issue noted in
ZOOKEEPER-1748. I have an open PR
(https://github.com/apache/zookeeper/pull/83) for it and added a detailed
breakdown to the conversation on Github there. 

TL;DR: Connection tracking timeouts for the quorum/leader election port may
be vexing you.

Cheers,

Glenn



--
View this message in context: http://zookeeper-user.578899.n2.nabble.com/Zookeeper-leader-election-takes-a-long-time-tp7582712p7582739.html
Sent from the zookeeper-user mailing list archive at Nabble.com.

Re: Zookeeper leader election takes a long time.

Posted by Ryan Zhang <ya...@hotmail.com>.

From that log, it looks like Node 3 and 2 formed a quorum (Node 1 can’t talk to Node2 for some reason). Even if Node 3 didn’t hear from Node 2 after 12:46, it is still a special case as we had a kinda network partition for about 10 mins.   I wonder if you can repro this in an environment that Node 2 is definitely down? I will take a look at the log after 12:46 when I get a chance.

> On Oct 14, 2016, at 1:40 PM, Anand Parthasarathy <an...@avinetworks.com> wrote:
> 
> Hi Michael and Ryan,
> 
> Thanks for looking into the logs.
> 
> Actually, in this case, node 2 was brought down ungracefully and was down
> permanently. In this customer deployment, our product runs as a docker
> container in a bare metal environment. It is possible that the docker proxy
> that provides external connectivity to the zookeeper ports is down but the
> container itself is up and running for several more minutes. If you grep
> for "2 (n.sid)" you will see that there is no packets after 12:46.
> Convergence after that did not happen until 15:57 or so.
> 
> Thanks,
> Anand.
> 
> On Thu, Oct 13, 2016 at 5:10 PM, Michael Han <ha...@cloudera.com> wrote:
> 
>>>> it started a new round but then I seem to see the election messages
>> from Node 2 again. Any idea why?
>> 
>> My guess is node 2 is back online and ZK service was started. In this use
>> case node 2 does not get offline permanently, IIUC.
>> 
>> 
>> On Thu, Oct 13, 2016 at 3:41 PM, Ryan Zhang <ya...@hotmail.com>
>> wrote:
>> 
>>> Hi, Anand, I took a look and I wonder how do you explain the following
>>> 
>>> The N1.log starts at around
>>> 2016-10-03 12:37:38,073 [myid:1] - INFO  [QuorumPeer[myid=1]/127.0.0.1:
>>> 5002:QuorumPeer@714] - LOOKING
>>> 
>>> and it failed to talk to Node 2 (id: 2)
>>> 2016-10-03 12:37:38,136 [myid:1] - WARN  [WorkerSender[myid=1]:QuorumCn
>>> xManager@382] - Cannot open channel to 2 at election address
>>> node2.controller.local/127.0.0.8:5004
>>> java.net.ConnectException: Connection refused
>>>        at java.net.PlainSocketImpl.socketConnect(Native Method)
>>>        at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSock
>>> etImpl.java:339)
>>>        at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPl
>>> ainSocketImpl.java:200)
>>>        at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocket
>>> Impl.java:182)
>>>        at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
>>>        at java.net.Socket.connect(Socket.java:579)
>>> 
>>> However, after failed to get any election messages from Node 3, it
>>> started a new round but then I seem to see the election messages from Node
>>> 2 again. Any idea why?
>>> 
>>> 2016-10-03 12:38:32,815 [myid:1] - INFO  [QuorumPeer[myid=1]/127.0.0.1:
>>> 5002:QuorumPeer@714] - LOOKING
>>> 2016-10-03 12:38:32,816 [myid:1] - INFO  [QuorumPeer[myid=1]/127.0.0.1:
>>> 5002:FileSnap@83] - Reading snapshot /var/lib/zookeeper/version-2/s
>>> napshot.20002aa0b
>>> 2016-10-03 12:38:32,875 [myid:1] - INFO  [QuorumPeer[myid=1]/127.0.0.1:
>>> 5002:FastLeaderElection@815] - New election. My id =  1, proposed
>>> zxid=0x20002bfb5
>>> 2016-10-03 12:38:32,876 [myid:1] - INFO  [WorkerReceiver[myid=1]:FastLe
>>> aderElection@597] - Notification: 1 (message format version), 1
>>> (n.leader), 0x20002bfb5 (n.zxid), 0x6 (n.round), LOOKING (n.state), 1
>>> (n.sid), 0x2 (n.peerEpoch) LOOKING (my state)
>>> 2016-10-03 12:38:32,878 [myid:1] - INFO  [WorkerReceiver[myid=1]:FastLe
>>> aderElection@597] - Notification: 1 (message format version), 2
>>> (n.leader), 0x20002bfb5 (n.zxid), 0x6 (n.round), LOOKING (n.state), 2
>>> (n.sid), 0x2 (n.peerEpoch) LOOKING (my state)
>>> 2016-10-03 12:38:32,878 [myid:1] - INFO  [WorkerReceiver[myid=1]:FastLe
>>> aderElection@597] - Notification: 1 (message format version), 2
>>> (n.leader), 0x20002bfb5 (n.zxid), 0x6 (n.round), LOOKING (n.state), 1
>>> (n.sid), 0x2 (n.peerEpoch) LOOKING (my state)
>>> 2016-10-03 12:38:32,917 [myid:1] - INFO  [WorkerReceiver[myid=1]:FastLe
>>> aderElection@597] - Notification: 1 (message format version), 3
>>> (n.leader), 0x20002bfb5 (n.zxid), 0x4 (n.round), LEADING (n.state), 3
>>> (n.sid), 0x2 (n.peerEpoch) LOOKING (my state)
>>> 2016-10-03 12:38:33,117 [myid:1] - INFO  [QuorumPeer[myid=1]/127.0.0.1:
>>> 5002:FastLeaderElection@849] - Notification time out: 400
>>> 2016-10-03 12:38:33,118 [myid:1] - INFO  [WorkerReceiver[myid=1]:FastLe
>>> aderElection@597] - Notification: 1 (message format version), 2
>>> (n.leader), 0x20002bfb5 (n.zxid), 0x6 (n.round), LOOKING (n.state), 1
>>> (n.sid), 0x2 (n.peerEpoch) LOOKING (my state)
>>> 2016-10-03 12:38:33,159 [myid:1] - INFO  [WorkerReceiver[myid=1]:FastLe
>>> aderElection@597] - Notification: 1 (message format version), 3
>>> (n.leader), 0x20002bfb5 (n.zxid), 0x4 (n.round), LEADING (n.state), 3
>>> (n.sid), 0x2 (n.peerEpoch) LOOKING (my state)
>>> 2016-10-03 12:38:33,559 [myid:1] - INFO  [QuorumPeer[myid=1]/127.0.0.1:
>>> 5002:FastLeaderElection@849] - Notification time out: 800
>>> 2016-10-03 12:38:33,560 [myid:1] - INFO  [WorkerReceiver[myid=1]:FastLe
>>> aderElection@597] - Notification: 1 (message format version), 2
>>> (n.leader), 0x20002bfb5 (n.zxid), 0x6 (n.round), LOOKING (n.state), 1
>>> (n.sid), 0x2 (n.peerEpoch) LOOKING (my state)
>>> 2016-10-03 12:38:33,561 [myid:1] - INFO  [WorkerReceiver[myid=1]:FastLe
>>> aderElection@597] - Notification: 1 (message format version), 2
>>> (n.leader), 0x20002bfb5 (n.zxid), 0x6 (n.round), LEADING (n.state), 2
>>> (n.sid), 0x2 (n.peerEpoch) LOOKING (my state)
>>> 2016-10-03 12:38:33,601 [myid:1] - INFO  [WorkerReceiver[myid=1]:FastLe
>>> aderElection@597] - Notification: 1 (message format version), 3
>>> (n.leader), 0x20002bfb5 (n.zxid), 0x4 (n.round), LEADING (n.state), 3
>>> (n.sid), 0x2 (n.peerEpoch) LOOKING (my state)
>>> 2016-10-03 12:38:34,402 [myid:1] - INFO  [QuorumPeer[myid=1]/127.0.0.1:
>>> 5002:FastLeaderElection@849] - Notification time out: 1600
>>> 
>>> 
>>> 
>>> On Oct 8, 2016, at 10:05 AM, Anand Parthasarathy <
>>> anpartha@avinetworks.com<ma...@avinetworks.com>> wrote:
>>> 
>>> Hi Flavio,
>>> 
>>> I have attached the logs from node 1 and node 3. Node 2 was powered off
>>> around 10-03 12:36. Leader election kept going until 10-03 15:57:16 when it
>>> finally converged.
>>> 
>>> Thanks,
>>> Anand.
>>> 
>>> On Sat, Oct 8, 2016 at 7:55 AM, Flavio Junqueira <fpj@apache.org<mailto:
>>> fpj@apache.org>> wrote:
>>> Hi Anand,
>>> 
>>> I don't understand whether 1 and 3 were able or even trying to connect to
>>> each other. They should be able to elect a leader between them and make
>>> progress. You might want to upload logs and let us know.
>>> 
>>> -Flavio
>>> 
>>>> On 08 Oct 2016, at 02:11, Anand Parthasarathy <anpartha@avinetworks.com
>>> <ma...@avinetworks.com>> wrote:
>>>> 
>>>> Hi,
>>>> 
>>>> We are currently using zookeeper 3.4.6 version and use a 3 node
>>> solution in
>>>> our system. We see that occasionally, when a node is powered off (in
>>> this
>>>> instance, it was actually a leader node), the remaining two nodes do not
>>>> form a quorum for a really long time. Looking at the logs, it appears
>>> the
>>>> sequence is as follows:
>>>> - Node 2 is the zookeeper leader
>>>> - Node 2 is powered off
>>>> - Node 1 and Node 3 recognize and start the election
>>>> - Node 3 times out after initLimit * tickTime with "Timeout while
>>> waiting
>>>> for quorum" for Round N
>>>> - Node 1 times out after initLimit * tickTime with "Exception while
>>> trying
>>>> to follow leader" for Round N+1 at the same time.
>>>> - And the process continues where N is sequentially incrementing.
>>>> - This happens for a long time.
>>>> - In one instance, we used tickTime=5000 and initLimit=20 and it took
>>>> around 3.5 hours to converge.
>>>> - In a given round, Node 1 will try connecting to Node 2, gets
>>> connection
>>>> refused waits for notification timeout which increases by 2 every
>>> iteration
>>>> until it hits the initLimit. Connection Refused is because the node 2
>>> comes
>>>> up after reboot, but zookeeper process is not started (due to a
>>> different
>>>> failure).
>>>> 
>>>> It looks similar to ZOOKEEPER-2164 but there it is a connection timeout
>>>> where Node 2 is not reachable.
>>>> 
>>>> Could you pls. share if you have seen this issue and if so, what is the
>>>> workaround that can be employed in 3.4.6.
>>>> 
>>>> Thanks,
>>>> Anand.
>>> 
>>> 
>>> <log.tgz>
>>> 
>>> 
>> 
>> 
>> --
>> Cheers
>> Michael.
>>

Re: Zookeeper leader election takes a long time.

Posted by Anand Parthasarathy <an...@avinetworks.com>.

Hi Michael and Ryan,

Thanks for looking into the logs.

Actually, in this case, node 2 was brought down ungracefully and was down
permanently. In this customer deployment, our product runs as a docker
container in a bare metal environment. It is possible that the docker proxy
that provides external connectivity to the zookeeper ports is down but the
container itself is up and running for several more minutes. If you grep
for "2 (n.sid)" you will see that there is no packets after 12:46.
Convergence after that did not happen until 15:57 or so.

Thanks,
Anand.

On Thu, Oct 13, 2016 at 5:10 PM, Michael Han <ha...@cloudera.com> wrote:

> >> it started a new round but then I seem to see the election messages
> from Node 2 again. Any idea why?
>
> My guess is node 2 is back online and ZK service was started. In this use
> case node 2 does not get offline permanently, IIUC.
>
>
> On Thu, Oct 13, 2016 at 3:41 PM, Ryan Zhang <ya...@hotmail.com>
> wrote:
>
>> Hi, Anand, I took a look and I wonder how do you explain the following
>>
>> The N1.log starts at around
>> 2016-10-03 12:37:38,073 [myid:1] - INFO  [QuorumPeer[myid=1]/127.0.0.1:
>> 5002:QuorumPeer@714] - LOOKING
>>
>> and it failed to talk to Node 2 (id: 2)
>> 2016-10-03 12:37:38,136 [myid:1] - WARN  [WorkerSender[myid=1]:QuorumCn
>> xManager@382] - Cannot open channel to 2 at election address
>> node2.controller.local/127.0.0.8:5004
>> java.net.ConnectException: Connection refused
>>         at java.net.PlainSocketImpl.socketConnect(Native Method)
>>         at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSock
>> etImpl.java:339)
>>         at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPl
>> ainSocketImpl.java:200)
>>         at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocket
>> Impl.java:182)
>>         at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
>>         at java.net.Socket.connect(Socket.java:579)
>>
>> However, after failed to get any election messages from Node 3, it
>> started a new round but then I seem to see the election messages from Node
>> 2 again. Any idea why?
>>
>> 2016-10-03 12:38:32,815 [myid:1] - INFO  [QuorumPeer[myid=1]/127.0.0.1:
>> 5002:QuorumPeer@714] - LOOKING
>> 2016-10-03 12:38:32,816 [myid:1] - INFO  [QuorumPeer[myid=1]/127.0.0.1:
>> 5002:FileSnap@83] - Reading snapshot /var/lib/zookeeper/version-2/s
>> napshot.20002aa0b
>> 2016-10-03 12:38:32,875 [myid:1] - INFO  [QuorumPeer[myid=1]/127.0.0.1:
>> 5002:FastLeaderElection@815] - New election. My id =  1, proposed
>> zxid=0x20002bfb5
>> 2016-10-03 12:38:32,876 [myid:1] - INFO  [WorkerReceiver[myid=1]:FastLe
>> aderElection@597] - Notification: 1 (message format version), 1
>> (n.leader), 0x20002bfb5 (n.zxid), 0x6 (n.round), LOOKING (n.state), 1
>> (n.sid), 0x2 (n.peerEpoch) LOOKING (my state)
>> 2016-10-03 12:38:32,878 [myid:1] - INFO  [WorkerReceiver[myid=1]:FastLe
>> aderElection@597] - Notification: 1 (message format version), 2
>> (n.leader), 0x20002bfb5 (n.zxid), 0x6 (n.round), LOOKING (n.state), 2
>> (n.sid), 0x2 (n.peerEpoch) LOOKING (my state)
>> 2016-10-03 12:38:32,878 [myid:1] - INFO  [WorkerReceiver[myid=1]:FastLe
>> aderElection@597] - Notification: 1 (message format version), 2
>> (n.leader), 0x20002bfb5 (n.zxid), 0x6 (n.round), LOOKING (n.state), 1
>> (n.sid), 0x2 (n.peerEpoch) LOOKING (my state)
>> 2016-10-03 12:38:32,917 [myid:1] - INFO  [WorkerReceiver[myid=1]:FastLe
>> aderElection@597] - Notification: 1 (message format version), 3
>> (n.leader), 0x20002bfb5 (n.zxid), 0x4 (n.round), LEADING (n.state), 3
>> (n.sid), 0x2 (n.peerEpoch) LOOKING (my state)
>> 2016-10-03 12:38:33,117 [myid:1] - INFO  [QuorumPeer[myid=1]/127.0.0.1:
>> 5002:FastLeaderElection@849] - Notification time out: 400
>> 2016-10-03 12:38:33,118 [myid:1] - INFO  [WorkerReceiver[myid=1]:FastLe
>> aderElection@597] - Notification: 1 (message format version), 2
>> (n.leader), 0x20002bfb5 (n.zxid), 0x6 (n.round), LOOKING (n.state), 1
>> (n.sid), 0x2 (n.peerEpoch) LOOKING (my state)
>> 2016-10-03 12:38:33,159 [myid:1] - INFO  [WorkerReceiver[myid=1]:FastLe
>> aderElection@597] - Notification: 1 (message format version), 3
>> (n.leader), 0x20002bfb5 (n.zxid), 0x4 (n.round), LEADING (n.state), 3
>> (n.sid), 0x2 (n.peerEpoch) LOOKING (my state)
>> 2016-10-03 12:38:33,559 [myid:1] - INFO  [QuorumPeer[myid=1]/127.0.0.1:
>> 5002:FastLeaderElection@849] - Notification time out: 800
>> 2016-10-03 12:38:33,560 [myid:1] - INFO  [WorkerReceiver[myid=1]:FastLe
>> aderElection@597] - Notification: 1 (message format version), 2
>> (n.leader), 0x20002bfb5 (n.zxid), 0x6 (n.round), LOOKING (n.state), 1
>> (n.sid), 0x2 (n.peerEpoch) LOOKING (my state)
>> 2016-10-03 12:38:33,561 [myid:1] - INFO  [WorkerReceiver[myid=1]:FastLe
>> aderElection@597] - Notification: 1 (message format version), 2
>> (n.leader), 0x20002bfb5 (n.zxid), 0x6 (n.round), LEADING (n.state), 2
>> (n.sid), 0x2 (n.peerEpoch) LOOKING (my state)
>> 2016-10-03 12:38:33,601 [myid:1] - INFO  [WorkerReceiver[myid=1]:FastLe
>> aderElection@597] - Notification: 1 (message format version), 3
>> (n.leader), 0x20002bfb5 (n.zxid), 0x4 (n.round), LEADING (n.state), 3
>> (n.sid), 0x2 (n.peerEpoch) LOOKING (my state)
>> 2016-10-03 12:38:34,402 [myid:1] - INFO  [QuorumPeer[myid=1]/127.0.0.1:
>> 5002:FastLeaderElection@849] - Notification time out: 1600
>>
>>
>>
>> On Oct 8, 2016, at 10:05 AM, Anand Parthasarathy <
>> anpartha@avinetworks.com<ma...@avinetworks.com>> wrote:
>>
>> Hi Flavio,
>>
>> I have attached the logs from node 1 and node 3. Node 2 was powered off
>> around 10-03 12:36. Leader election kept going until 10-03 15:57:16 when it
>> finally converged.
>>
>> Thanks,
>> Anand.
>>
>> On Sat, Oct 8, 2016 at 7:55 AM, Flavio Junqueira <fpj@apache.org<mailto:
>> fpj@apache.org>> wrote:
>> Hi Anand,
>>
>> I don't understand whether 1 and 3 were able or even trying to connect to
>> each other. They should be able to elect a leader between them and make
>> progress. You might want to upload logs and let us know.
>>
>> -Flavio
>>
>> > On 08 Oct 2016, at 02:11, Anand Parthasarathy <anpartha@avinetworks.com
>> <ma...@avinetworks.com>> wrote:
>> >
>> > Hi,
>> >
>> > We are currently using zookeeper 3.4.6 version and use a 3 node
>> solution in
>> > our system. We see that occasionally, when a node is powered off (in
>> this
>> > instance, it was actually a leader node), the remaining two nodes do not
>> > form a quorum for a really long time. Looking at the logs, it appears
>> the
>> > sequence is as follows:
>> > - Node 2 is the zookeeper leader
>> > - Node 2 is powered off
>> > - Node 1 and Node 3 recognize and start the election
>> > - Node 3 times out after initLimit * tickTime with "Timeout while
>> waiting
>> > for quorum" for Round N
>> > - Node 1 times out after initLimit * tickTime with "Exception while
>> trying
>> > to follow leader" for Round N+1 at the same time.
>> > - And the process continues where N is sequentially incrementing.
>> > - This happens for a long time.
>> > - In one instance, we used tickTime=5000 and initLimit=20 and it took
>> > around 3.5 hours to converge.
>> > - In a given round, Node 1 will try connecting to Node 2, gets
>> connection
>> > refused waits for notification timeout which increases by 2 every
>> iteration
>> > until it hits the initLimit. Connection Refused is because the node 2
>> comes
>> > up after reboot, but zookeeper process is not started (due to a
>> different
>> > failure).
>> >
>> > It looks similar to ZOOKEEPER-2164 but there it is a connection timeout
>> > where Node 2 is not reachable.
>> >
>> > Could you pls. share if you have seen this issue and if so, what is the
>> > workaround that can be employed in 3.4.6.
>> >
>> > Thanks,
>> > Anand.
>>
>>
>> <log.tgz>
>>
>>
>
>
> --
> Cheers
> Michael.
>

Re: Zookeeper leader election takes a long time.

Posted by Michael Han <ha...@cloudera.com>.

>> it started a new round but then I seem to see the election messages from
Node 2 again. Any idea why?

My guess is node 2 is back online and ZK service was started. In this use
case node 2 does not get offline permanently, IIUC.


On Thu, Oct 13, 2016 at 3:41 PM, Ryan Zhang <ya...@hotmail.com>
wrote:

> Hi, Anand, I took a look and I wonder how do you explain the following
>
> The N1.log starts at around
> 2016-10-03 12:37:38,073 [myid:1] - INFO  [QuorumPeer[myid=1]/127.0.0.1:
> 5002:QuorumPeer@714] - LOOKING
>
> and it failed to talk to Node 2 (id: 2)
> 2016-10-03 12:37:38,136 [myid:1] - WARN  [WorkerSender[myid=1]:
> QuorumCnxManager@382] - Cannot open channel to 2 at election address
> node2.controller.local/127.0.0.8:5004
> java.net.ConnectException: Connection refused
>         at java.net.PlainSocketImpl.socketConnect(Native Method)
>         at java.net.AbstractPlainSocketImpl.doConnect(
> AbstractPlainSocketImpl.java:339)
>         at java.net.AbstractPlainSocketImpl.connectToAddress(
> AbstractPlainSocketImpl.java:200)
>         at java.net.AbstractPlainSocketImpl.connect(
> AbstractPlainSocketImpl.java:182)
>         at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
>         at java.net.Socket.connect(Socket.java:579)
>
> However, after failed to get any election messages from Node 3, it started
> a new round but then I seem to see the election messages from Node 2 again.
> Any idea why?
>
> 2016-10-03 12:38:32,815 [myid:1] - INFO  [QuorumPeer[myid=1]/127.0.0.1:
> 5002:QuorumPeer@714] - LOOKING
> 2016-10-03 12:38:32,816 [myid:1] - INFO  [QuorumPeer[myid=1]/127.0.0.1:
> 5002:FileSnap@83] - Reading snapshot /var/lib/zookeeper/version-2/
> snapshot.20002aa0b
> 2016-10-03 12:38:32,875 [myid:1] - INFO  [QuorumPeer[myid=1]/127.0.0.1:
> 5002:FastLeaderElection@815] - New election. My id =  1, proposed
> zxid=0x20002bfb5
> 2016-10-03 12:38:32,876 [myid:1] - INFO  [WorkerReceiver[myid=1]:
> FastLeaderElection@597] - Notification: 1 (message format version), 1
> (n.leader), 0x20002bfb5 (n.zxid), 0x6 (n.round), LOOKING (n.state), 1
> (n.sid), 0x2 (n.peerEpoch) LOOKING (my state)
> 2016-10-03 12:38:32,878 [myid:1] - INFO  [WorkerReceiver[myid=1]:
> FastLeaderElection@597] - Notification: 1 (message format version), 2
> (n.leader), 0x20002bfb5 (n.zxid), 0x6 (n.round), LOOKING (n.state), 2
> (n.sid), 0x2 (n.peerEpoch) LOOKING (my state)
> 2016-10-03 12:38:32,878 [myid:1] - INFO  [WorkerReceiver[myid=1]:
> FastLeaderElection@597] - Notification: 1 (message format version), 2
> (n.leader), 0x20002bfb5 (n.zxid), 0x6 (n.round), LOOKING (n.state), 1
> (n.sid), 0x2 (n.peerEpoch) LOOKING (my state)
> 2016-10-03 12:38:32,917 [myid:1] - INFO  [WorkerReceiver[myid=1]:
> FastLeaderElection@597] - Notification: 1 (message format version), 3
> (n.leader), 0x20002bfb5 (n.zxid), 0x4 (n.round), LEADING (n.state), 3
> (n.sid), 0x2 (n.peerEpoch) LOOKING (my state)
> 2016-10-03 12:38:33,117 [myid:1] - INFO  [QuorumPeer[myid=1]/127.0.0.1:
> 5002:FastLeaderElection@849] - Notification time out: 400
> 2016-10-03 12:38:33,118 [myid:1] - INFO  [WorkerReceiver[myid=1]:
> FastLeaderElection@597] - Notification: 1 (message format version), 2
> (n.leader), 0x20002bfb5 (n.zxid), 0x6 (n.round), LOOKING (n.state), 1
> (n.sid), 0x2 (n.peerEpoch) LOOKING (my state)
> 2016-10-03 12:38:33,159 [myid:1] - INFO  [WorkerReceiver[myid=1]:
> FastLeaderElection@597] - Notification: 1 (message format version), 3
> (n.leader), 0x20002bfb5 (n.zxid), 0x4 (n.round), LEADING (n.state), 3
> (n.sid), 0x2 (n.peerEpoch) LOOKING (my state)
> 2016-10-03 12:38:33,559 [myid:1] - INFO  [QuorumPeer[myid=1]/127.0.0.1:
> 5002:FastLeaderElection@849] - Notification time out: 800
> 2016-10-03 12:38:33,560 [myid:1] - INFO  [WorkerReceiver[myid=1]:
> FastLeaderElection@597] - Notification: 1 (message format version), 2
> (n.leader), 0x20002bfb5 (n.zxid), 0x6 (n.round), LOOKING (n.state), 1
> (n.sid), 0x2 (n.peerEpoch) LOOKING (my state)
> 2016-10-03 12:38:33,561 [myid:1] - INFO  [WorkerReceiver[myid=1]:
> FastLeaderElection@597] - Notification: 1 (message format version), 2
> (n.leader), 0x20002bfb5 (n.zxid), 0x6 (n.round), LEADING (n.state), 2
> (n.sid), 0x2 (n.peerEpoch) LOOKING (my state)
> 2016-10-03 12:38:33,601 [myid:1] - INFO  [WorkerReceiver[myid=1]:
> FastLeaderElection@597] - Notification: 1 (message format version), 3
> (n.leader), 0x20002bfb5 (n.zxid), 0x4 (n.round), LEADING (n.state), 3
> (n.sid), 0x2 (n.peerEpoch) LOOKING (my state)
> 2016-10-03 12:38:34,402 [myid:1] - INFO  [QuorumPeer[myid=1]/127.0.0.1:
> 5002:FastLeaderElection@849] - Notification time out: 1600
>
>
>
> On Oct 8, 2016, at 10:05 AM, Anand Parthasarathy <anpartha@avinetworks.com
> <ma...@avinetworks.com>> wrote:
>
> Hi Flavio,
>
> I have attached the logs from node 1 and node 3. Node 2 was powered off
> around 10-03 12:36. Leader election kept going until 10-03 15:57:16 when it
> finally converged.
>
> Thanks,
> Anand.
>
> On Sat, Oct 8, 2016 at 7:55 AM, Flavio Junqueira <fpj@apache.org<mailto:
> fpj@apache.org>> wrote:
> Hi Anand,
>
> I don't understand whether 1 and 3 were able or even trying to connect to
> each other. They should be able to elect a leader between them and make
> progress. You might want to upload logs and let us know.
>
> -Flavio
>
> > On 08 Oct 2016, at 02:11, Anand Parthasarathy <anpartha@avinetworks.com<
> mailto:anpartha@avinetworks.com>> wrote:
> >
> > Hi,
> >
> > We are currently using zookeeper 3.4.6 version and use a 3 node solution
> in
> > our system. We see that occasionally, when a node is powered off (in this
> > instance, it was actually a leader node), the remaining two nodes do not
> > form a quorum for a really long time. Looking at the logs, it appears the
> > sequence is as follows:
> > - Node 2 is the zookeeper leader
> > - Node 2 is powered off
> > - Node 1 and Node 3 recognize and start the election
> > - Node 3 times out after initLimit * tickTime with "Timeout while waiting
> > for quorum" for Round N
> > - Node 1 times out after initLimit * tickTime with "Exception while
> trying
> > to follow leader" for Round N+1 at the same time.
> > - And the process continues where N is sequentially incrementing.
> > - This happens for a long time.
> > - In one instance, we used tickTime=5000 and initLimit=20 and it took
> > around 3.5 hours to converge.
> > - In a given round, Node 1 will try connecting to Node 2, gets connection
> > refused waits for notification timeout which increases by 2 every
> iteration
> > until it hits the initLimit. Connection Refused is because the node 2
> comes
> > up after reboot, but zookeeper process is not started (due to a different
> > failure).
> >
> > It looks similar to ZOOKEEPER-2164 but there it is a connection timeout
> > where Node 2 is not reachable.
> >
> > Could you pls. share if you have seen this issue and if so, what is the
> > workaround that can be employed in 3.4.6.
> >
> > Thanks,
> > Anand.
>
>
> <log.tgz>
>
>


-- 
Cheers
Michael.

Re: Zookeeper leader election takes a long time.

Posted by Ryan Zhang <ya...@hotmail.com>.

Btw, also, from n3.log it looks like id: 3 actually started to follow id:2

016-10-03 12:39:18,682 [myid:3] - INFO  [WorkerReceiver[myid=3]:FastLeaderElection@597] - Notification: 1 (message format version), 2 (n.leader), 0x20002bfb5 (n.zxid), 0x7 (n.round), LEADING (n.state), 2 (n.sid), 0x3 (n.peerEpoch) LOOKING (my state)
2016-10-03 12:39:18,722 [myid:3] - INFO  [WorkerReceiver[myid=3]:FastLeaderElection@597] - Notification: 1 (message format version), 2 (n.leader), 0x20002bfb5 (n.zxid), 0x7 (n.round), FOLLOWING (n.state), 1 (n.sid), 0x3 (n.peerEpoch) LOOKING (my state)
2016-10-03 12:39:18,722 [myid:3] - INFO  [QuorumPeer[myid=3]/127.0.0.1:5002:QuorumPeer@784] - FOLLOWING
2016-10-03 12:39:18,722 [myid:3] - INFO  [QuorumPeer[myid=3]/127.0.0.1:5002:ZooKeeperServer@162] - Created server with tickTime 5000 minSessionTimeout 10000 maxSessionTimeout 100000 datadir /var/lib/zookeeper/version-2 snapdir /var/lib/zookeeper/version-2
2016-10-03 12:39:18,722 [myid:3] - INFO  [QuorumPeer[myid=3]/127.0.0.1:5002:Follower@63] - FOLLOWING - LEADER ELECTION TOOK - 182
2016-10-03 12:39:18,756 [myid:3] - INFO  [QuorumPeer[myid=3]/127.0.0.1:5002:Learner@323] - Getting a diff from the leader 0x300000130
2016-10-03 12:39:18,760 [myid:3] - WARN  [QuorumPeer[myid=3]/127.0.0.1:5002:Learner@374] - Got zxid 0x300000001 expected 0x1
2016-10-03 12:39:18,763 [myid:3] - INFO  [QuorumPeer[myid=3]/127.0.0.1:5002:FileTxnSnapLog@240] - Snapshotting: 0x300000130 to /var/lib/zookeeper/version-2/snapshot.300000130
2016-10-03 12:39:18,801 [myid:3] - WARN  [QuorumPeer[myid=3]/127.0.0.1:5002:Follower@118] - Got zxid 0x300000133 expected 0x1


On Oct 13, 2016, at 3:41 PM, Ryan Zhang <ya...@hotmail.com>> wrote:

Hi, Anand, I took a look and I wonder how do you explain the following

The N1.log starts at around
2016-10-03 12:37:38,073 [myid:1] - INFO  [QuorumPeer[myid=1]/127.0.0.1:5002:QuorumPeer@714] - LOOKING

and it failed to talk to Node 2 (id: 2)
2016-10-03 12:37:38,136 [myid:1] - WARN  [WorkerSender[myid=1]:QuorumCnxManager@382] - Cannot open channel to 2 at election address node2.controller.local/127.0.0.8:5004
java.net.ConnectException: Connection refused
       at java.net.PlainSocketImpl.socketConnect(Native Method)
       at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
       at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
       at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
       at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
       at java.net.Socket.connect(Socket.java:579)

However, after failed to get any election messages from Node 3, it started a new round but then I seem to see the election messages from Node 2 again. Any idea why?

2016-10-03 12:38:32,815 [myid:1] - INFO  [QuorumPeer[myid=1]/127.0.0.1:5002:QuorumPeer@714] - LOOKING
2016-10-03 12:38:32,816 [myid:1] - INFO  [QuorumPeer[myid=1]/127.0.0.1:5002:FileSnap@83] - Reading snapshot /var/lib/zookeeper/version-2/snapshot.20002aa0b
2016-10-03 12:38:32,875 [myid:1] - INFO  [QuorumPeer[myid=1]/127.0.0.1:5002:FastLeaderElection@815] - New election. My id =  1, proposed zxid=0x20002bfb5
2016-10-03 12:38:32,876 [myid:1] - INFO  [WorkerReceiver[myid=1]:FastLeaderElection@597] - Notification: 1 (message format version), 1 (n.leader), 0x20002bfb5 (n.zxid), 0x6 (n.round), LOOKING (n.state), 1 (n.sid), 0x2 (n.peerEpoch) LOOKING (my state)
2016-10-03 12:38:32,878 [myid:1] - INFO  [WorkerReceiver[myid=1]:FastLeaderElection@597] - Notification: 1 (message format version), 2 (n.leader), 0x20002bfb5 (n.zxid), 0x6 (n.round), LOOKING (n.state), 2 (n.sid), 0x2 (n.peerEpoch) LOOKING (my state)
2016-10-03 12:38:32,878 [myid:1] - INFO  [WorkerReceiver[myid=1]:FastLeaderElection@597] - Notification: 1 (message format version), 2 (n.leader), 0x20002bfb5 (n.zxid), 0x6 (n.round), LOOKING (n.state), 1 (n.sid), 0x2 (n.peerEpoch) LOOKING (my state)
2016-10-03 12:38:32,917 [myid:1] - INFO  [WorkerReceiver[myid=1]:FastLeaderElection@597] - Notification: 1 (message format version), 3 (n.leader), 0x20002bfb5 (n.zxid), 0x4 (n.round), LEADING (n.state), 3 (n.sid), 0x2 (n.peerEpoch) LOOKING (my state)
2016-10-03 12:38:33,117 [myid:1] - INFO  [QuorumPeer[myid=1]/127.0.0.1:5002:FastLeaderElection@849] - Notification time out: 400
2016-10-03 12:38:33,118 [myid:1] - INFO  [WorkerReceiver[myid=1]:FastLeaderElection@597] - Notification: 1 (message format version), 2 (n.leader), 0x20002bfb5 (n.zxid), 0x6 (n.round), LOOKING (n.state), 1 (n.sid), 0x2 (n.peerEpoch) LOOKING (my state)
2016-10-03 12:38:33,159 [myid:1] - INFO  [WorkerReceiver[myid=1]:FastLeaderElection@597] - Notification: 1 (message format version), 3 (n.leader), 0x20002bfb5 (n.zxid), 0x4 (n.round), LEADING (n.state), 3 (n.sid), 0x2 (n.peerEpoch) LOOKING (my state)
2016-10-03 12:38:33,559 [myid:1] - INFO  [QuorumPeer[myid=1]/127.0.0.1:5002:FastLeaderElection@849] - Notification time out: 800
2016-10-03 12:38:33,560 [myid:1] - INFO  [WorkerReceiver[myid=1]:FastLeaderElection@597] - Notification: 1 (message format version), 2 (n.leader), 0x20002bfb5 (n.zxid), 0x6 (n.round), LOOKING (n.state), 1 (n.sid), 0x2 (n.peerEpoch) LOOKING (my state)
2016-10-03 12:38:33,561 [myid:1] - INFO  [WorkerReceiver[myid=1]:FastLeaderElection@597] - Notification: 1 (message format version), 2 (n.leader), 0x20002bfb5 (n.zxid), 0x6 (n.round), LEADING (n.state), 2 (n.sid), 0x2 (n.peerEpoch) LOOKING (my state)
2016-10-03 12:38:33,601 [myid:1] - INFO  [WorkerReceiver[myid=1]:FastLeaderElection@597] - Notification: 1 (message format version), 3 (n.leader), 0x20002bfb5 (n.zxid), 0x4 (n.round), LEADING (n.state), 3 (n.sid), 0x2 (n.peerEpoch) LOOKING (my state)
2016-10-03 12:38:34,402 [myid:1] - INFO  [QuorumPeer[myid=1]/127.0.0.1:5002:FastLeaderElection@849] - Notification time out: 1600



On Oct 8, 2016, at 10:05 AM, Anand Parthasarathy <an...@avinetworks.com>> wrote:

Hi Flavio,

I have attached the logs from node 1 and node 3. Node 2 was powered off around 10-03 12:36. Leader election kept going until 10-03 15:57:16 when it finally converged.

Thanks,
Anand.

On Sat, Oct 8, 2016 at 7:55 AM, Flavio Junqueira <fp...@apache.org>> wrote:
Hi Anand,

I don't understand whether 1 and 3 were able or even trying to connect to each other. They should be able to elect a leader between them and make progress. You might want to upload logs and let us know.

-Flavio

On 08 Oct 2016, at 02:11, Anand Parthasarathy <an...@avinetworks.com>> wrote:

Hi,

We are currently using zookeeper 3.4.6 version and use a 3 node solution in
our system. We see that occasionally, when a node is powered off (in this
instance, it was actually a leader node), the remaining two nodes do not
form a quorum for a really long time. Looking at the logs, it appears the
sequence is as follows:
- Node 2 is the zookeeper leader
- Node 2 is powered off
- Node 1 and Node 3 recognize and start the election
- Node 3 times out after initLimit * tickTime with "Timeout while waiting
for quorum" for Round N
- Node 1 times out after initLimit * tickTime with "Exception while trying
to follow leader" for Round N+1 at the same time.
- And the process continues where N is sequentially incrementing.
- This happens for a long time.
- In one instance, we used tickTime=5000 and initLimit=20 and it took
around 3.5 hours to converge.
- In a given round, Node 1 will try connecting to Node 2, gets connection
refused waits for notification timeout which increases by 2 every iteration
until it hits the initLimit. Connection Refused is because the node 2 comes
up after reboot, but zookeeper process is not started (due to a different
failure).

It looks similar to ZOOKEEPER-2164 but there it is a connection timeout
where Node 2 is not reachable.

Could you pls. share if you have seen this issue and if so, what is the
workaround that can be employed in 3.4.6.

Thanks,
Anand.


<log.tgz>

Re: Zookeeper leader election takes a long time.

Posted by Ryan Zhang <ya...@hotmail.com>.

Hi, Anand, I took a look and I wonder how do you explain the following

The N1.log starts at around
2016-10-03 12:37:38,073 [myid:1] - INFO  [QuorumPeer[myid=1]/127.0.0.1:5002:QuorumPeer@714] - LOOKING

and it failed to talk to Node 2 (id: 2)
2016-10-03 12:37:38,136 [myid:1] - WARN  [WorkerSender[myid=1]:QuorumCnxManager@382] - Cannot open channel to 2 at election address node2.controller.local/127.0.0.8:5004
java.net.ConnectException: Connection refused
        at java.net.PlainSocketImpl.socketConnect(Native Method)
        at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
        at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
        at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
        at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
        at java.net.Socket.connect(Socket.java:579)

However, after failed to get any election messages from Node 3, it started a new round but then I seem to see the election messages from Node 2 again. Any idea why?

2016-10-03 12:38:32,815 [myid:1] - INFO  [QuorumPeer[myid=1]/127.0.0.1:5002:QuorumPeer@714] - LOOKING
2016-10-03 12:38:32,816 [myid:1] - INFO  [QuorumPeer[myid=1]/127.0.0.1:5002:FileSnap@83] - Reading snapshot /var/lib/zookeeper/version-2/snapshot.20002aa0b
2016-10-03 12:38:32,875 [myid:1] - INFO  [QuorumPeer[myid=1]/127.0.0.1:5002:FastLeaderElection@815] - New election. My id =  1, proposed zxid=0x20002bfb5
2016-10-03 12:38:32,876 [myid:1] - INFO  [WorkerReceiver[myid=1]:FastLeaderElection@597] - Notification: 1 (message format version), 1 (n.leader), 0x20002bfb5 (n.zxid), 0x6 (n.round), LOOKING (n.state), 1 (n.sid), 0x2 (n.peerEpoch) LOOKING (my state)
2016-10-03 12:38:32,878 [myid:1] - INFO  [WorkerReceiver[myid=1]:FastLeaderElection@597] - Notification: 1 (message format version), 2 (n.leader), 0x20002bfb5 (n.zxid), 0x6 (n.round), LOOKING (n.state), 2 (n.sid), 0x2 (n.peerEpoch) LOOKING (my state)
2016-10-03 12:38:32,878 [myid:1] - INFO  [WorkerReceiver[myid=1]:FastLeaderElection@597] - Notification: 1 (message format version), 2 (n.leader), 0x20002bfb5 (n.zxid), 0x6 (n.round), LOOKING (n.state), 1 (n.sid), 0x2 (n.peerEpoch) LOOKING (my state)
2016-10-03 12:38:32,917 [myid:1] - INFO  [WorkerReceiver[myid=1]:FastLeaderElection@597] - Notification: 1 (message format version), 3 (n.leader), 0x20002bfb5 (n.zxid), 0x4 (n.round), LEADING (n.state), 3 (n.sid), 0x2 (n.peerEpoch) LOOKING (my state)
2016-10-03 12:38:33,117 [myid:1] - INFO  [QuorumPeer[myid=1]/127.0.0.1:5002:FastLeaderElection@849] - Notification time out: 400
2016-10-03 12:38:33,118 [myid:1] - INFO  [WorkerReceiver[myid=1]:FastLeaderElection@597] - Notification: 1 (message format version), 2 (n.leader), 0x20002bfb5 (n.zxid), 0x6 (n.round), LOOKING (n.state), 1 (n.sid), 0x2 (n.peerEpoch) LOOKING (my state)
2016-10-03 12:38:33,159 [myid:1] - INFO  [WorkerReceiver[myid=1]:FastLeaderElection@597] - Notification: 1 (message format version), 3 (n.leader), 0x20002bfb5 (n.zxid), 0x4 (n.round), LEADING (n.state), 3 (n.sid), 0x2 (n.peerEpoch) LOOKING (my state)
2016-10-03 12:38:33,559 [myid:1] - INFO  [QuorumPeer[myid=1]/127.0.0.1:5002:FastLeaderElection@849] - Notification time out: 800
2016-10-03 12:38:33,560 [myid:1] - INFO  [WorkerReceiver[myid=1]:FastLeaderElection@597] - Notification: 1 (message format version), 2 (n.leader), 0x20002bfb5 (n.zxid), 0x6 (n.round), LOOKING (n.state), 1 (n.sid), 0x2 (n.peerEpoch) LOOKING (my state)
2016-10-03 12:38:33,561 [myid:1] - INFO  [WorkerReceiver[myid=1]:FastLeaderElection@597] - Notification: 1 (message format version), 2 (n.leader), 0x20002bfb5 (n.zxid), 0x6 (n.round), LEADING (n.state), 2 (n.sid), 0x2 (n.peerEpoch) LOOKING (my state)
2016-10-03 12:38:33,601 [myid:1] - INFO  [WorkerReceiver[myid=1]:FastLeaderElection@597] - Notification: 1 (message format version), 3 (n.leader), 0x20002bfb5 (n.zxid), 0x4 (n.round), LEADING (n.state), 3 (n.sid), 0x2 (n.peerEpoch) LOOKING (my state)
2016-10-03 12:38:34,402 [myid:1] - INFO  [QuorumPeer[myid=1]/127.0.0.1:5002:FastLeaderElection@849] - Notification time out: 1600



On Oct 8, 2016, at 10:05 AM, Anand Parthasarathy <an...@avinetworks.com>> wrote:

Hi Flavio,

I have attached the logs from node 1 and node 3. Node 2 was powered off around 10-03 12:36. Leader election kept going until 10-03 15:57:16 when it finally converged.

Thanks,
Anand.

On Sat, Oct 8, 2016 at 7:55 AM, Flavio Junqueira <fp...@apache.org>> wrote:
Hi Anand,

I don't understand whether 1 and 3 were able or even trying to connect to each other. They should be able to elect a leader between them and make progress. You might want to upload logs and let us know.

-Flavio

> On 08 Oct 2016, at 02:11, Anand Parthasarathy <an...@avinetworks.com>> wrote:
>
> Hi,
>
> We are currently using zookeeper 3.4.6 version and use a 3 node solution in
> our system. We see that occasionally, when a node is powered off (in this
> instance, it was actually a leader node), the remaining two nodes do not
> form a quorum for a really long time. Looking at the logs, it appears the
> sequence is as follows:
> - Node 2 is the zookeeper leader
> - Node 2 is powered off
> - Node 1 and Node 3 recognize and start the election
> - Node 3 times out after initLimit * tickTime with "Timeout while waiting
> for quorum" for Round N
> - Node 1 times out after initLimit * tickTime with "Exception while trying
> to follow leader" for Round N+1 at the same time.
> - And the process continues where N is sequentially incrementing.
> - This happens for a long time.
> - In one instance, we used tickTime=5000 and initLimit=20 and it took
> around 3.5 hours to converge.
> - In a given round, Node 1 will try connecting to Node 2, gets connection
> refused waits for notification timeout which increases by 2 every iteration
> until it hits the initLimit. Connection Refused is because the node 2 comes
> up after reboot, but zookeeper process is not started (due to a different
> failure).
>
> It looks similar to ZOOKEEPER-2164 but there it is a connection timeout
> where Node 2 is not reachable.
>
> Could you pls. share if you have seen this issue and if so, what is the
> workaround that can be employed in 3.4.6.
>
> Thanks,
> Anand.


<log.tgz>

Re: Zookeeper leader election takes a long time.

Posted by Anand Parthasarathy <an...@avinetworks.com>.

Hi Flavio,

I have attached the logs from node 1 and node 3. Node 2 was powered off
around 10-03 12:36. Leader election kept going until 10-03 15:57:16 when it
finally converged.

Thanks,
Anand.

On Sat, Oct 8, 2016 at 7:55 AM, Flavio Junqueira <fp...@apache.org> wrote:

> Hi Anand,
>
> I don't understand whether 1 and 3 were able or even trying to connect to
> each other. They should be able to elect a leader between them and make
> progress. You might want to upload logs and let us know.
>
> -Flavio
>
> > On 08 Oct 2016, at 02:11, Anand Parthasarathy <an...@avinetworks.com>
> wrote:
> >
> > Hi,
> >
> > We are currently using zookeeper 3.4.6 version and use a 3 node solution
> in
> > our system. We see that occasionally, when a node is powered off (in this
> > instance, it was actually a leader node), the remaining two nodes do not
> > form a quorum for a really long time. Looking at the logs, it appears the
> > sequence is as follows:
> > - Node 2 is the zookeeper leader
> > - Node 2 is powered off
> > - Node 1 and Node 3 recognize and start the election
> > - Node 3 times out after initLimit * tickTime with "Timeout while waiting
> > for quorum" for Round N
> > - Node 1 times out after initLimit * tickTime with "Exception while
> trying
> > to follow leader" for Round N+1 at the same time.
> > - And the process continues where N is sequentially incrementing.
> > - This happens for a long time.
> > - In one instance, we used tickTime=5000 and initLimit=20 and it took
> > around 3.5 hours to converge.
> > - In a given round, Node 1 will try connecting to Node 2, gets connection
> > refused waits for notification timeout which increases by 2 every
> iteration
> > until it hits the initLimit. Connection Refused is because the node 2
> comes
> > up after reboot, but zookeeper process is not started (due to a different
> > failure).
> >
> > It looks similar to ZOOKEEPER-2164 but there it is a connection timeout
> > where Node 2 is not reachable.
> >
> > Could you pls. share if you have seen this issue and if so, what is the
> > workaround that can be employed in 3.4.6.
> >
> > Thanks,
> > Anand.
>
>

Re: Zookeeper leader election takes a long time.

Posted by Anand Parthasarathy <an...@avinetworks.com>.

Yes, we had noted in the past that if the node2 comes back up, it
converges. In this case, in node2, zookeeper could not be started and in
this case, it took 3 hours to converge.

Thanks,
Anand.

On Sat, Oct 8, 2016 at 9:50 AM, Ben Sherman <be...@gmail.com> wrote:

> I had this exact problem but couldn't wait for them to reestablish
> connection and when node 2 came back up, everything recovered.  It was with
> 3.4.8 - I posted logs in a previous message.
>
> On Sat, Oct 8, 2016 at 7:55 AM, Flavio Junqueira <fp...@apache.org> wrote:
>
> > Hi Anand,
> >
> > I don't understand whether 1 and 3 were able or even trying to connect to
> > each other. They should be able to elect a leader between them and make
> > progress. You might want to upload logs and let us know.
> >
> > -Flavio
> >
> > > On 08 Oct 2016, at 02:11, Anand Parthasarathy <
> anpartha@avinetworks.com>
> > wrote:
> > >
> > > Hi,
> > >
> > > We are currently using zookeeper 3.4.6 version and use a 3 node
> solution
> > in
> > > our system. We see that occasionally, when a node is powered off (in
> this
> > > instance, it was actually a leader node), the remaining two nodes do
> not
> > > form a quorum for a really long time. Looking at the logs, it appears
> the
> > > sequence is as follows:
> > > - Node 2 is the zookeeper leader
> > > - Node 2 is powered off
> > > - Node 1 and Node 3 recognize and start the election
> > > - Node 3 times out after initLimit * tickTime with "Timeout while
> waiting
> > > for quorum" for Round N
> > > - Node 1 times out after initLimit * tickTime with "Exception while
> > trying
> > > to follow leader" for Round N+1 at the same time.
> > > - And the process continues where N is sequentially incrementing.
> > > - This happens for a long time.
> > > - In one instance, we used tickTime=5000 and initLimit=20 and it took
> > > around 3.5 hours to converge.
> > > - In a given round, Node 1 will try connecting to Node 2, gets
> connection
> > > refused waits for notification timeout which increases by 2 every
> > iteration
> > > until it hits the initLimit. Connection Refused is because the node 2
> > comes
> > > up after reboot, but zookeeper process is not started (due to a
> different
> > > failure).
> > >
> > > It looks similar to ZOOKEEPER-2164 but there it is a connection timeout
> > > where Node 2 is not reachable.
> > >
> > > Could you pls. share if you have seen this issue and if so, what is the
> > > workaround that can be employed in 3.4.6.
> > >
> > > Thanks,
> > > Anand.
> >
> >
>

Re: Zookeeper leader election takes a long time.

Posted by Ben Sherman <be...@gmail.com>.

I had this exact problem but couldn't wait for them to reestablish
connection and when node 2 came back up, everything recovered.  It was with
3.4.8 - I posted logs in a previous message.

On Sat, Oct 8, 2016 at 7:55 AM, Flavio Junqueira <fp...@apache.org> wrote:

> Hi Anand,
>
> I don't understand whether 1 and 3 were able or even trying to connect to
> each other. They should be able to elect a leader between them and make
> progress. You might want to upload logs and let us know.
>
> -Flavio
>
> > On 08 Oct 2016, at 02:11, Anand Parthasarathy <an...@avinetworks.com>
> wrote:
> >
> > Hi,
> >
> > We are currently using zookeeper 3.4.6 version and use a 3 node solution
> in
> > our system. We see that occasionally, when a node is powered off (in this
> > instance, it was actually a leader node), the remaining two nodes do not
> > form a quorum for a really long time. Looking at the logs, it appears the
> > sequence is as follows:
> > - Node 2 is the zookeeper leader
> > - Node 2 is powered off
> > - Node 1 and Node 3 recognize and start the election
> > - Node 3 times out after initLimit * tickTime with "Timeout while waiting
> > for quorum" for Round N
> > - Node 1 times out after initLimit * tickTime with "Exception while
> trying
> > to follow leader" for Round N+1 at the same time.
> > - And the process continues where N is sequentially incrementing.
> > - This happens for a long time.
> > - In one instance, we used tickTime=5000 and initLimit=20 and it took
> > around 3.5 hours to converge.
> > - In a given round, Node 1 will try connecting to Node 2, gets connection
> > refused waits for notification timeout which increases by 2 every
> iteration
> > until it hits the initLimit. Connection Refused is because the node 2
> comes
> > up after reboot, but zookeeper process is not started (due to a different
> > failure).
> >
> > It looks similar to ZOOKEEPER-2164 but there it is a connection timeout
> > where Node 2 is not reachable.
> >
> > Could you pls. share if you have seen this issue and if so, what is the
> > workaround that can be employed in 3.4.6.
> >
> > Thanks,
> > Anand.
>
>

Re: Zookeeper leader election takes a long time.

Posted by Flavio Junqueira <fp...@apache.org>.

Hi Anand,

I don't understand whether 1 and 3 were able or even trying to connect to each other. They should be able to elect a leader between them and make progress. You might want to upload logs and let us know.

-Flavio
 
> On 08 Oct 2016, at 02:11, Anand Parthasarathy <an...@avinetworks.com> wrote:
> 
> Hi,
> 
> We are currently using zookeeper 3.4.6 version and use a 3 node solution in
> our system. We see that occasionally, when a node is powered off (in this
> instance, it was actually a leader node), the remaining two nodes do not
> form a quorum for a really long time. Looking at the logs, it appears the
> sequence is as follows:
> - Node 2 is the zookeeper leader
> - Node 2 is powered off
> - Node 1 and Node 3 recognize and start the election
> - Node 3 times out after initLimit * tickTime with "Timeout while waiting
> for quorum" for Round N
> - Node 1 times out after initLimit * tickTime with "Exception while trying
> to follow leader" for Round N+1 at the same time.
> - And the process continues where N is sequentially incrementing.
> - This happens for a long time.
> - In one instance, we used tickTime=5000 and initLimit=20 and it took
> around 3.5 hours to converge.
> - In a given round, Node 1 will try connecting to Node 2, gets connection
> refused waits for notification timeout which increases by 2 every iteration
> until it hits the initLimit. Connection Refused is because the node 2 comes
> up after reboot, but zookeeper process is not started (due to a different
> failure).
> 
> It looks similar to ZOOKEEPER-2164 but there it is a connection timeout
> where Node 2 is not reachable.
> 
> Could you pls. share if you have seen this issue and if so, what is the
> workaround that can be employed in 3.4.6.
> 
> Thanks,
> Anand.