You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by Marshall McMullen <ma...@gmail.com> on 2013/05/13 19:48:05 UTC

Quorum failure with concurrent client connections?

I'm debugging a problem we're seeing where after quorum loss quorum does
not recover as I expect it should. It seems that I've isolated the problem
to quorum not be re-established if there are clients trying to connect to
the ensemble at the same time that the nodes are coming up and trying to
form quorum. Is there any known issue with this? I've searched for open
Jiras without any luck.

Re: Quorum failure with concurrent client connections?

Posted by Thawan Kooburat <th...@fb.com>.
What I have seen so far is mostly related to init/sync limit together with
snapshot size. (ZOOKEEPER-1697, ZOOKEEPER-1521)

It might be possible that a client trying to reconnect cause a load spike
on the server and push the server over the limit,  but you will have to
have lots of clients in this case.

I think it will be easier to narrow down the problem by checking which
phase (e.g. Leader election or synchronization) the quorum fails


-- 
Thawan Kooburat





On 5/13/13 10:48 AM, "Marshall McMullen" <ma...@gmail.com>
wrote:

>I'm debugging a problem we're seeing where after quorum loss quorum does
>not recover as I expect it should. It seems that I've isolated the problem
>to quorum not be re-established if there are clients trying to connect to
>the ensemble at the same time that the nodes are coming up and trying to
>form quorum. Is there any known issue with this? I've searched for open
>Jiras without any luck.