You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Omid Aladini <om...@gmail.com> on 2015/02/03 13:23:44 UTC

When ZooKeeper quorum is down

Hi,

Reading the official FAQ, I bumped into this paragraph:

Once the Zookeeper quorum is down, brokers could result in a bad state and
> could not normally serve client requests, etc. Although when Zookeeper
> quorum recovers, the Kafka brokers should be able to resume to normal state
> automatically, there are still a few corner cases the they cannot and a
> hard kill-and-recovery is required to bring it back to normal. Hence it is
> recommended to closely monitor your zookeeper cluster and provision it so
> that it is performant.


What are the corner cases exactly? Any JIRA tickets to explore? How do the
corner cases relate to the ZooKeeper cluster being "performant" and
"closely monitored"? I'm specifically interested in the inevitable scenario
that the ZK leader exits / dies and the quorum goes down momentarily (due
to hardware failure, rolling restart, etc).

Thanks,
Omid

Re: When ZooKeeper quorum is down

Posted by Omid Aladini <om...@gmail.com>.
Sure: https://issues.apache.org/jira/browse/KAFKA-1918

Thanks!
Omid

On Tue, Feb 3, 2015 at 5:32 PM, Guozhang Wang <wa...@gmail.com> wrote:

> Hi Omid,
>
> That is an interesting question.. This paragraph was written some time ago
> and we have not test ZK failure / resume since, and it is hard to tell if
> these cases still exist or not.
>
> One thing we can do is to add different ZK quorum failure scenarios to the
> system test to have it covered over time. Could you file a JIRA?
>
> Guozhang
>
> On Tue, Feb 3, 2015 at 4:23 AM, Omid Aladini <om...@gmail.com>
> wrote:
>
> > Hi,
> >
> > Reading the official FAQ, I bumped into this paragraph:
> >
> > Once the Zookeeper quorum is down, brokers could result in a bad state
> and
> > > could not normally serve client requests, etc. Although when Zookeeper
> > > quorum recovers, the Kafka brokers should be able to resume to normal
> > state
> > > automatically, there are still a few corner cases the they cannot and a
> > > hard kill-and-recovery is required to bring it back to normal. Hence it
> > is
> > > recommended to closely monitor your zookeeper cluster and provision it
> so
> > > that it is performant.
> >
> >
> > What are the corner cases exactly? Any JIRA tickets to explore? How do
> the
> > corner cases relate to the ZooKeeper cluster being "performant" and
> > "closely monitored"? I'm specifically interested in the inevitable
> scenario
> > that the ZK leader exits / dies and the quorum goes down momentarily (due
> > to hardware failure, rolling restart, etc).
> >
> > Thanks,
> > Omid
> >
>
>
>
> --
> -- Guozhang
>

Re: When ZooKeeper quorum is down

Posted by Guozhang Wang <wa...@gmail.com>.
Hi Omid,

That is an interesting question.. This paragraph was written some time ago
and we have not test ZK failure / resume since, and it is hard to tell if
these cases still exist or not.

One thing we can do is to add different ZK quorum failure scenarios to the
system test to have it covered over time. Could you file a JIRA?

Guozhang

On Tue, Feb 3, 2015 at 4:23 AM, Omid Aladini <om...@gmail.com> wrote:

> Hi,
>
> Reading the official FAQ, I bumped into this paragraph:
>
> Once the Zookeeper quorum is down, brokers could result in a bad state and
> > could not normally serve client requests, etc. Although when Zookeeper
> > quorum recovers, the Kafka brokers should be able to resume to normal
> state
> > automatically, there are still a few corner cases the they cannot and a
> > hard kill-and-recovery is required to bring it back to normal. Hence it
> is
> > recommended to closely monitor your zookeeper cluster and provision it so
> > that it is performant.
>
>
> What are the corner cases exactly? Any JIRA tickets to explore? How do the
> corner cases relate to the ZooKeeper cluster being "performant" and
> "closely monitored"? I'm specifically interested in the inevitable scenario
> that the ZK leader exits / dies and the quorum goes down momentarily (due
> to hardware failure, rolling restart, etc).
>
> Thanks,
> Omid
>



-- 
-- Guozhang