You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@zookeeper.apache.org by Ben Sherman <be...@gmail.com> on 2017/06/02 21:46:59 UTC

Tickling the election ports

Hi all,

Regarding my recent outages, I have a suspicion that there is some stateful
connection tracking happening between my servers that is invisible to me.
(In this case, it's across availability zones in AWS VPCs).

This has come up in both a JIRA ticket at
https://issues.apache.org/jira/browse/ZOOKEEPER-1748 and a PR in the git
repo at https://github.com/apache/zookeeper/pull/83

I believe that when an enseble is started that there are connections setup
between each server on port 3888 (among others). As the server is normally
healthy, there is no traffic across that connection beyond the initial
election. At some point with no traffic, the black box NAT device removes
it from the state table but does not send a FIN or RST down the pipe, but
the service thinks the connection still exists. During a failure, ZK will
attempt to send traffic down said pipe during a new election, but it won't
work, and will have to wait for the system timeouts to kill the connection.

Am I correct in the following assumptions:

1. When an ensemble is healthy, no traffic goes across the election ports.
2. There is no way to trigger traffic across those ports (four letter
command or otherwise) without causing a failure in the ensemble.
3. I can cause traffic on those ports across the entire ensemble should I
restart any node in the ensemble.

Finally, is there any way to shine any light on the above issues that
highlight this? I have considered forking 3.4.10 to do this, but the
overhead required is more than I can afford right now going down the line.

Re: Tickling the election ports

Posted by Ben Sherman <be...@gmail.com>.

Aaaaand, just like that, I have a working 3.4 patch!  Sent you (and others)
a PR, please take a look when you can!

On Mon, Jun 5, 2017 at 9:48 AM, Ben Sherman <be...@gmail.com> wrote:

> >
>> > Finally, is there any way to shine any light on the above issues that
>> > highlight this? I have considered forking 3.4.10 to do this, but the
>> > overhead required is more than I can afford right now going down the
>> line.
>
>
>> I'm not sure I understand the question, why do you want to fork?
>
>
> We want to getTCP keep-alives turned on as in:
>
> https://github.com/apache/zookeeper/pull/83
> and
> https://issues.apache.org/jira/browse/ZOOKEEPER-1748
>
> I am working on getting the nits fixed in the patch attached to that, but
> I haven't sent a PR to ZK before, and I'd like to get this done quickly -
> any assistance in getting this patched would be super helpful. That patch
> got stalled long ago, and I'm worried the same might happen to mine.
>
>
>
>
>

Re: Tickling the election ports

Posted by Ben Sherman <be...@gmail.com>.

>
> >
> > Finally, is there any way to shine any light on the above issues that
> > highlight this? I have considered forking 3.4.10 to do this, but the
> > overhead required is more than I can afford right now going down the
> line.


> I'm not sure I understand the question, why do you want to fork?


We want to getTCP keep-alives turned on as in:

https://github.com/apache/zookeeper/pull/83
and
https://issues.apache.org/jira/browse/ZOOKEEPER-1748

I am working on getting the nits fixed in the patch attached to that, but I
haven't sent a PR to ZK before, and I'd like to get this done quickly - any
assistance in getting this patched would be super helpful. That patch got
stalled long ago, and I'm worried the same might happen to mine.

Re: Tickling the election ports

Posted by Flavio Junqueira <fp...@apache.org>.

Hi Ben,

To your points:

> On 02 Jun 2017, at 23:46, Ben Sherman <be...@gmail.com> wrote:
> 
> Hi all,
> 
> Regarding my recent outages, I have a suspicion that there is some stateful
> connection tracking happening between my servers that is invisible to me.
> (In this case, it's across availability zones in AWS VPCs).
> 
> This has come up in both a JIRA ticket at
> https://issues.apache.org/jira/browse/ZOOKEEPER-1748 and a PR in the git
> repo at https://github.com/apache/zookeeper/pull/83
> 
> I believe that when an enseble is started that there are connections setup
> between each server on port 3888 (among others). As the server is normally
> healthy, there is no traffic across that connection beyond the initial
> election. At some point with no traffic, the black box NAT device removes
> it from the state table but does not send a FIN or RST down the pipe, but
> the service thinks the connection still exists. During a failure, ZK will
> attempt to send traffic down said pipe during a new election, but it won't
> work, and will have to wait for the system timeouts to kill the connection.
> 
> Am I correct in the following assumptions:
> 
> 1. When an ensemble is healthy, no traffic goes across the election ports.

Yes, no election notifications are sent.

> 2. There is no way to trigger traffic across those ports (four letter
> command or otherwise) without causing a failure in the ensemble.

I'm afraid not. In fact, a single failure doesn't necessarily induces traffic in all connections unless you hit the leader.

> 3. I can cause traffic on those ports across the entire ensemble should I
> restart any node in the ensemble.

Not really, the only way to induce traffic on all connections is to hit the leader. If you crash a follower and the leader
still has a quorum of followers, then you won't have any notification sent. If you bring that serve back up, there will be
some notifications, but it won't be all to all, only from the server to the rest of the ensemble. 

> 
> Finally, is there any way to shine any light on the above issues that
> highlight this? I have considered forking 3.4.10 to do this, but the
> overhead required is more than I can afford right now going down the line.

I'm not sure I understand the question, why do you want to fork?

-Flavio