You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Prabhjot Bharaj <pr...@gmail.com> on 2015/09/18 14:48:59 UTC

Useful metric to check slow ISR catchup

Hi,

I've noticed that 1 follower replica node out of my kafka cluster catches
up to the data form the leader pretty slowly.
My topic has just 1 partition with 3 replicas. One other follower replica
gets the full data from the leader pretty instantly

It takes around 22 minutes to catch up 500MB of data.

I have setup ganglia monitoring on my cluster and I'm interested in knowing
what metric exposed through JMX would be useful in checking the reason for
such slowness

Thanks,
Prabhjot

Fwd: Useful metric to check slow ISR catchup

Posted by Prabhjot Bharaj <pr...@gmail.com>.
Hi Dev Folks,

Request your expertise on this doubt of mine

Thanks,
Prabhjot
---------- Forwarded message ----------
From: Prabhjot Bharaj <pr...@gmail.com>
Date: Mon, Sep 21, 2015 at 2:59 PM
Subject: Re: Useful metric to check slow ISR catchup
To: users@kafka.apache.org


Hi,

Attaching a screenshot of bytes/sec from Ganglia

As you can see, the graph in RED color belongs to the third replica, for
which the bytes/sec is around 10 times lower than its 2 peers (in Green and
Blue)
Earlier, I was thinking that it could be related to that 1 system only, but
when I created a new topic with 1 partition and 3 replicas, I see similar
graph on the other set of machines.

I'm not sure what parameter could be causing this. Any pointers are
appreciated

Thanks,
Prabhjot

On Mon, Sep 21, 2015 at 1:20 PM, Prabhjot Bharaj <pr...@gmail.com>
wrote:

> Hello Folks,
>
> Request your expertise on this
>
> Thanks,
> Prabhjot
>
> On Fri, Sep 18, 2015 at 6:18 PM, Prabhjot Bharaj <pr...@gmail.com>
> wrote:
>
>> Hi,
>>
>> I've noticed that 1 follower replica node out of my kafka cluster catches
>> up to the data form the leader pretty slowly.
>> My topic has just 1 partition with 3 replicas. One other follower replica
>> gets the full data from the leader pretty instantly
>>
>> It takes around 22 minutes to catch up 500MB of data.
>>
>> I have setup ganglia monitoring on my cluster and I'm interested in
>> knowing what metric exposed through JMX would be useful in checking the
>> reason for such slowness
>>
>> Thanks,
>> Prabhjot
>>
>
>
>
> --
> ---------------------------------------------------------
> "There are only 10 types of people in the world: Those who understand
> binary, and those who don't"
>



-- 
---------------------------------------------------------
"There are only 10 types of people in the world: Those who understand
binary, and those who don't"



-- 
---------------------------------------------------------
"There are only 10 types of people in the world: Those who understand
binary, and those who don't"

Re: Useful metric to check slow ISR catchup

Posted by Prabhjot Bharaj <pr...@gmail.com>.
Hi,

Attaching a screenshot of bytes/sec from Ganglia

As you can see, the graph in RED color belongs to the third replica, for
which the bytes/sec is around 10 times lower than its 2 peers (in Green and
Blue)
Earlier, I was thinking that it could be related to that 1 system only, but
when I created a new topic with 1 partition and 3 replicas, I see similar
graph on the other set of machines.

I'm not sure what parameter could be causing this. Any pointers are
appreciated

Thanks,
Prabhjot

On Mon, Sep 21, 2015 at 1:20 PM, Prabhjot Bharaj <pr...@gmail.com>
wrote:

> Hello Folks,
>
> Request your expertise on this
>
> Thanks,
> Prabhjot
>
> On Fri, Sep 18, 2015 at 6:18 PM, Prabhjot Bharaj <pr...@gmail.com>
> wrote:
>
>> Hi,
>>
>> I've noticed that 1 follower replica node out of my kafka cluster catches
>> up to the data form the leader pretty slowly.
>> My topic has just 1 partition with 3 replicas. One other follower replica
>> gets the full data from the leader pretty instantly
>>
>> It takes around 22 minutes to catch up 500MB of data.
>>
>> I have setup ganglia monitoring on my cluster and I'm interested in
>> knowing what metric exposed through JMX would be useful in checking the
>> reason for such slowness
>>
>> Thanks,
>> Prabhjot
>>
>
>
>
> --
> ---------------------------------------------------------
> "There are only 10 types of people in the world: Those who understand
> binary, and those who don't"
>



-- 
---------------------------------------------------------
"There are only 10 types of people in the world: Those who understand
binary, and those who don't"

Re: Useful metric to check slow ISR catchup

Posted by Prabhjot Bharaj <pr...@gmail.com>.
Hello Folks,

Request your expertise on this

Thanks,
Prabhjot

On Fri, Sep 18, 2015 at 6:18 PM, Prabhjot Bharaj <pr...@gmail.com>
wrote:

> Hi,
>
> I've noticed that 1 follower replica node out of my kafka cluster catches
> up to the data form the leader pretty slowly.
> My topic has just 1 partition with 3 replicas. One other follower replica
> gets the full data from the leader pretty instantly
>
> It takes around 22 minutes to catch up 500MB of data.
>
> I have setup ganglia monitoring on my cluster and I'm interested in
> knowing what metric exposed through JMX would be useful in checking the
> reason for such slowness
>
> Thanks,
> Prabhjot
>



-- 
---------------------------------------------------------
"There are only 10 types of people in the world: Those who understand
binary, and those who don't"