You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Zach Cox <zc...@gmail.com> on 2020/04/01 19:50:58 UTC

Re: Broker always out of ISRs

Hi Liam,


> Any issues with partitions broker 2 is leader of?
>

Earlier today, broker 2 was not leader of any partitions. At that time, 2
appeared to be in ISRs of all partitions where 1 was leader, but 2 was not
in any ISRs of partitions where 0 was leader.

Currently, broker 2 is leader of 55 partitions, but does not appear to be
in ISRs of any other partitions, whether 0 or 1 is leader.


> Also, have you checked b2's server.log?
>

We don't see any logs that obviously indicate the problem, although we're
also not sure what things we should be looking for. There are a few
Zookeeper client timeouts, but haven't correlated that with anything yet.

Thanks,
Zach

Re: Broker always out of ISRs

Posted by Ben Weintraub <be...@gmail.com>.
I’ve had the same experience as Liam with this symptom (all followers on a
single broker of a given leader getting stuck). It sounds likely that
either the replica fetcher thread is getting stuck or dying with an
unhandled exception.

The the former case, jstack output can be helpful to understand why the
fetcher is stuck. There may or may not be a message in the broker logs on
the broker that’s failing to get in sync.

In the latter case, there should be evidence in the broker logs on the
broker that’s failing to get in sync (and the thread will be notably absent
in jstack output)

Ben

On Wed, Apr 1, 2020 at 5:40 PM Liam Clarke <li...@adscale.co.nz>
wrote:

> Hi Zach,
>
>  If you check the cluster's controller's controller.log, do you see broker
> 2 bouncing in and out of ISRs? There'll be logs to that effect. Or is it
> just never getting in-sync in the first place?
>
> Whenever I've had this issue in the past, it's been because the replica
> fetcher has died. Hate to say this, but have tried turning broker 2 on and
> off again? It's usually how I've resolved this issue when a broker won't
> stay in ISR. Also make sure that there's enough CPU/network on the machine
> it's running on - we've usually had this issue where CPU was very high or
> the network saturated.
>
> Cheers,
>
> Liam Clarke-Hutchinson
>
> On Thu, Apr 2, 2020 at 8:51 AM Zach Cox <zc...@gmail.com> wrote:
>
> > Hi Liam,
> >
> >
> > > Any issues with partitions broker 2 is leader of?
> > >
> >
> > Earlier today, broker 2 was not leader of any partitions. At that time, 2
> > appeared to be in ISRs of all partitions where 1 was leader, but 2 was
> not
> > in any ISRs of partitions where 0 was leader.
> >
> > Currently, broker 2 is leader of 55 partitions, but does not appear to be
> > in ISRs of any other partitions, whether 0 or 1 is leader.
> >
> >
> > > Also, have you checked b2's server.log?
> > >
> >
> > We don't see any logs that obviously indicate the problem, although we're
> > also not sure what things we should be looking for. There are a few
> > Zookeeper client timeouts, but haven't correlated that with anything yet.
> >
> > Thanks,
> > Zach
> >
>

Re: Broker always out of ISRs

Posted by Liam Clarke <li...@adscale.co.nz>.
Hi Zach,

 If you check the cluster's controller's controller.log, do you see broker
2 bouncing in and out of ISRs? There'll be logs to that effect. Or is it
just never getting in-sync in the first place?

Whenever I've had this issue in the past, it's been because the replica
fetcher has died. Hate to say this, but have tried turning broker 2 on and
off again? It's usually how I've resolved this issue when a broker won't
stay in ISR. Also make sure that there's enough CPU/network on the machine
it's running on - we've usually had this issue where CPU was very high or
the network saturated.

Cheers,

Liam Clarke-Hutchinson

On Thu, Apr 2, 2020 at 8:51 AM Zach Cox <zc...@gmail.com> wrote:

> Hi Liam,
>
>
> > Any issues with partitions broker 2 is leader of?
> >
>
> Earlier today, broker 2 was not leader of any partitions. At that time, 2
> appeared to be in ISRs of all partitions where 1 was leader, but 2 was not
> in any ISRs of partitions where 0 was leader.
>
> Currently, broker 2 is leader of 55 partitions, but does not appear to be
> in ISRs of any other partitions, whether 0 or 1 is leader.
>
>
> > Also, have you checked b2's server.log?
> >
>
> We don't see any logs that obviously indicate the problem, although we're
> also not sure what things we should be looking for. There are a few
> Zookeeper client timeouts, but haven't correlated that with anything yet.
>
> Thanks,
> Zach
>