You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@zookeeper.apache.org by "Bae, Jae Hyeon" <me...@gmail.com> on 2014/04/09 20:03:13 UTC

Restarting leader zookeeper instance made quorum lost

Hi zookeeper users

While rolling restart zookeeper cluster of 5 instances with 3 as quorum,
restarting the leader instance made quorum lost. Is this expected?
Otherwise, how can I restart the leader instance without interrupting whole
cluster? Or is this fixed in 3.4.6?

Thank you
Best, Jae

Re: Restarting leader zookeeper instance made quorum lost

Posted by Ted Dunning <te...@gmail.com>.

On Wed, Apr 9, 2014 at 12:56 PM, Bae, Jae Hyeon <me...@gmail.com> wrote:

> Let me clarify. a) is correct. There were normally running 5 instances with
> 3 as quorum. I restarted the leader instance and while re-electing leader,
> zookeeper cluster lost quorum for a minute and a few zookeeper clients lost
> connection. So, this is the form of losing quorum, correct?
>

Yes.


> Is there any way to avoid losing quorum while rolling restart of zookeeper
> cluster, specifically the leader instance?
>

No.

You have to always have 3 ZK nodes live in order to maintain continuous
operation.

Rolling restart implies that you wait long enough after restarting each
node so that it has a chance to rejoin the quorum.  If you do that then
restarting the leader will result in a tiny moment when writes will not be
accepted and may require some ZK clients to transparently reconnect to a
different ZK node, but it should be hard to detect any outage.



>
> Thank you
> Best, Jae
>
>
> On Wed, Apr 9, 2014 at 12:06 PM, Ted Dunning <te...@gmail.com>
> wrote:
>
> > Your email is a little ambiguous.
> >
> > a) "5 instances with 3 as quorum" could mean 5 instances configured and
> > running normally.
> >
> > Or
> >
> > b) it could mean 5 instances with 2 instances that are down.
> >
> > In (a) restarting the leader instance *should* cause the cluster to do a
> > leader election again and form a new quorum.  That is a form of losing
> > quorum.  If that is what you mean, this is normal.  A new quorum should
> be
> > formed and things should continue fairly soon.
> >
> > In (b), restarting the leader will result in only 2 instances running
> which
> > is not enough to maintain quorum and until you have at least 3 nodes
> > running again, you can't proceed.
> >
> >
> >
> >
> >
> >
> > On Wed, Apr 9, 2014 at 11:03 AM, Bae, Jae Hyeon <me...@gmail.com>
> > wrote:
> >
> > > Hi zookeeper users
> > >
> > > While rolling restart zookeeper cluster of 5 instances with 3 as
> quorum,
> > > restarting the leader instance made quorum lost. Is this expected?
> > > Otherwise, how can I restart the leader instance without interrupting
> > whole
> > > cluster? Or is this fixed in 3.4.6?
> > >
> > > Thank you
> > > Best, Jae
> > >
> >
>

Re: Restarting leader zookeeper instance made quorum lost

Posted by "Bae, Jae Hyeon" <me...@gmail.com>.

Hi Ted

Let me clarify. a) is correct. There were normally running 5 instances with
3 as quorum. I restarted the leader instance and while re-electing leader,
zookeeper cluster lost quorum for a minute and a few zookeeper clients lost
connection. So, this is the form of losing quorum, correct?

Is there any way to avoid losing quorum while rolling restart of zookeeper
cluster, specifically the leader instance?

Thank you
Best, Jae


On Wed, Apr 9, 2014 at 12:06 PM, Ted Dunning <te...@gmail.com> wrote:

> Your email is a little ambiguous.
>
> a) "5 instances with 3 as quorum" could mean 5 instances configured and
> running normally.
>
> Or
>
> b) it could mean 5 instances with 2 instances that are down.
>
> In (a) restarting the leader instance *should* cause the cluster to do a
> leader election again and form a new quorum.  That is a form of losing
> quorum.  If that is what you mean, this is normal.  A new quorum should be
> formed and things should continue fairly soon.
>
> In (b), restarting the leader will result in only 2 instances running which
> is not enough to maintain quorum and until you have at least 3 nodes
> running again, you can't proceed.
>
>
>
>
>
>
> On Wed, Apr 9, 2014 at 11:03 AM, Bae, Jae Hyeon <me...@gmail.com>
> wrote:
>
> > Hi zookeeper users
> >
> > While rolling restart zookeeper cluster of 5 instances with 3 as quorum,
> > restarting the leader instance made quorum lost. Is this expected?
> > Otherwise, how can I restart the leader instance without interrupting
> whole
> > cluster? Or is this fixed in 3.4.6?
> >
> > Thank you
> > Best, Jae
> >
>

Re: Restarting leader zookeeper instance made quorum lost

Posted by Ted Dunning <te...@gmail.com>.

Your email is a little ambiguous.

a) "5 instances with 3 as quorum" could mean 5 instances configured and
running normally.

Or

b) it could mean 5 instances with 2 instances that are down.

In (a) restarting the leader instance *should* cause the cluster to do a
leader election again and form a new quorum.  That is a form of losing
quorum.  If that is what you mean, this is normal.  A new quorum should be
formed and things should continue fairly soon.

In (b), restarting the leader will result in only 2 instances running which
is not enough to maintain quorum and until you have at least 3 nodes
running again, you can't proceed.

On Wed, Apr 9, 2014 at 11:03 AM, Bae, Jae Hyeon <me...@gmail.com> wrote:

> Hi zookeeper users
>
> While rolling restart zookeeper cluster of 5 instances with 3 as quorum,
> restarting the leader instance made quorum lost. Is this expected?
> Otherwise, how can I restart the leader instance without interrupting whole
> cluster? Or is this fixed in 3.4.6?
>
> Thank you
> Best, Jae
>

Re: Restarting leader zookeeper instance made quorum lost

Posted by Michi Mutsuzaki <mi...@cs.stanford.edu>.

Hi Jae,

No it's not expected. Do you have the server log files?

On Wed, Apr 9, 2014 at 11:03 AM, Bae, Jae Hyeon <me...@gmail.com> wrote:
> Hi zookeeper users
>
> While rolling restart zookeeper cluster of 5 instances with 3 as quorum,
> restarting the leader instance made quorum lost. Is this expected?
> Otherwise, how can I restart the leader instance without interrupting whole
> cluster? Or is this fixed in 3.4.6?
>
> Thank you
> Best, Jae