You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Gene Robichaux <Ge...@match.com> on 2015/02/28 15:16:24 UTC

Best way to show lag?

What is the best way to detect consumer lag?

We are running each consumer as a separate group and I am running the ConsumerOffsetChecker to assess the partitions and the lag for each group/consumer. I run this every 5 minutes. In some cases I run this command up to 75 times on each 5 min polling cycle (once for each group/consuer). An example of the command is (bin/kafka-run-class.sh kafka.tools.ConsumerOffsetChecker --group consumer-group1 --zkconnect zkhost:zkport)

The problem I am running into is CPU usage on the broker when these commands run. We have a dedicated broker that has no leader partitions, but the high CPU still concerns me.

Is there a better way to detect consumer lag? Preferably one that is less impactful?


Gene Robichaux
Manager, Database Operations
Match.com
8300 Douglas Avenue I Suite 800 I Dallas, TX  75225


Re: Best way to show lag?

Posted by Jiangjie Qin <jq...@linkedin.com.INVALID>.
If you are using ZK based offset commit, you have to read the offset from
ZK. If you can make code change, one potential improvement is to reuse
ZKClient as explained below.
Currently, ConsumerOffsetChecker only takes one consumer group for each
run. If you have a lot of consumer groups to check, every time you run the
tool, it will create another ZKClient, which is not very efficient in your
use case.
If your dev can make some minor code change to let the tool take a list of
consumer group and reuse ZKClient instead of creating it in each run. Can
you verify if this solution works? If it works, can you also create an
JIRA ticket and submit the patch? I think this might also be a good
improvement for Kafka based offset commit.

Jiangjie (Becket) Qin

On 3/1/15, 5:44 AM, "Gene Robichaux" <Ge...@match.com> wrote:

>That is what I am using. The problem is when I run it the CPU spikes on
>the broker I am running it from. I just wanted to know if there was a
>different way.
>
>Gene
>
>Sent from my iPhone
>
>> On Feb 28, 2015, at 10:46 PM, Guozhang Wang <wa...@gmail.com> wrote:
>> 
>> If it is ZK based offset commit, you can use the ConsumerOffsetChecker
>>tool
>> in kafka.tools.
>> 
>> On Sat, Feb 28, 2015 at 12:32 PM, Gene Robichaux
>><Ge...@match.com>
>> wrote:
>> 
>>> I think we ZK based offset commit. However I am not certain, I would
>>>have
>>> to get that from our DEV group. My role is PROD Ops.
>>> 
>>> Gene Robichaux
>>> Manager, Database Operations
>>> Match.com
>>> 8300 Douglas Avenue I Suite 800 I Dallas, TX  75225
>>> 
>>> -----Original Message-----
>>> From: Jiangjie Qin [mailto:jqin@linkedin.com.INVALID]
>>> Sent: Saturday, February 28, 2015 12:06 PM
>>> To: users@kafka.apache.org
>>> Subject: Re: Best way to show lag?
>>> 
>>> Are you using Kafka based offset commit or ZK based offset commit?
>>> 
>>>> On 2/28/15, 6:16 AM, "Gene Robichaux" <Ge...@match.com>
>>>>wrote:
>>>> 
>>>> What is the best way to detect consumer lag?
>>>> 
>>>> We are running each consumer as a separate group and I am running the
>>>> ConsumerOffsetChecker to assess the partitions and the lag for each
>>>> group/consumer. I run this every 5 minutes. In some cases I run this
>>>> command up to 75 times on each 5 min polling cycle (once for each
>>>> group/consuer). An example of the command is (bin/kafka-run-class.sh
>>>> kafka.tools.ConsumerOffsetChecker --group consumer-group1 --zkconnect
>>>> zkhost:zkport)
>>>> 
>>>> The problem I am running into is CPU usage on the broker when these
>>>> commands run. We have a dedicated broker that has no leader
>>>>partitions,
>>>> but the high CPU still concerns me.
>>>> 
>>>> Is there a better way to detect consumer lag? Preferably one that is
>>>> less impactful?
>>>> 
>>>> 
>>>> Gene Robichaux
>>>> Manager, Database Operations
>>>> Match.com
>>>> 8300 Douglas Avenue I Suite 800 I Dallas, TX  75225
>> 
>> 
>> -- 
>> -- Guozhang


Re: Best way to show lag?

Posted by Mayuresh Gharat <gh...@gmail.com>.
What do you mean by "We have a dedicated broker that has no leader
partitions". Are you running anything else on that machine? I think you can
run that tool Guozhang from any machine and don't require it to be a kafka
Broker.

Thanks,

Mayuresh

On Sun, Mar 1, 2015 at 5:44 AM, Gene Robichaux <Ge...@match.com>
wrote:

> That is what I am using. The problem is when I run it the CPU spikes on
> the broker I am running it from. I just wanted to know if there was a
> different way.
>
> Gene
>
> Sent from my iPhone
>
> > On Feb 28, 2015, at 10:46 PM, Guozhang Wang <wa...@gmail.com> wrote:
> >
> > If it is ZK based offset commit, you can use the ConsumerOffsetChecker
> tool
> > in kafka.tools.
> >
> > On Sat, Feb 28, 2015 at 12:32 PM, Gene Robichaux <
> Gene.Robichaux@match.com>
> > wrote:
> >
> >> I think we ZK based offset commit. However I am not certain, I would
> have
> >> to get that from our DEV group. My role is PROD Ops.
> >>
> >> Gene Robichaux
> >> Manager, Database Operations
> >> Match.com
> >> 8300 Douglas Avenue I Suite 800 I Dallas, TX  75225
> >>
> >> -----Original Message-----
> >> From: Jiangjie Qin [mailto:jqin@linkedin.com.INVALID]
> >> Sent: Saturday, February 28, 2015 12:06 PM
> >> To: users@kafka.apache.org
> >> Subject: Re: Best way to show lag?
> >>
> >> Are you using Kafka based offset commit or ZK based offset commit?
> >>
> >>> On 2/28/15, 6:16 AM, "Gene Robichaux" <Ge...@match.com>
> wrote:
> >>>
> >>> What is the best way to detect consumer lag?
> >>>
> >>> We are running each consumer as a separate group and I am running the
> >>> ConsumerOffsetChecker to assess the partitions and the lag for each
> >>> group/consumer. I run this every 5 minutes. In some cases I run this
> >>> command up to 75 times on each 5 min polling cycle (once for each
> >>> group/consuer). An example of the command is (bin/kafka-run-class.sh
> >>> kafka.tools.ConsumerOffsetChecker --group consumer-group1 --zkconnect
> >>> zkhost:zkport)
> >>>
> >>> The problem I am running into is CPU usage on the broker when these
> >>> commands run. We have a dedicated broker that has no leader partitions,
> >>> but the high CPU still concerns me.
> >>>
> >>> Is there a better way to detect consumer lag? Preferably one that is
> >>> less impactful?
> >>>
> >>>
> >>> Gene Robichaux
> >>> Manager, Database Operations
> >>> Match.com
> >>> 8300 Douglas Avenue I Suite 800 I Dallas, TX  75225
> >
> >
> > --
> > -- Guozhang
>



-- 
-Regards,
Mayuresh R. Gharat
(862) 250-7125

Re: Best way to show lag?

Posted by Gene Robichaux <Ge...@match.com>.
That is what I am using. The problem is when I run it the CPU spikes on the broker I am running it from. I just wanted to know if there was a different way.

Gene

Sent from my iPhone

> On Feb 28, 2015, at 10:46 PM, Guozhang Wang <wa...@gmail.com> wrote:
> 
> If it is ZK based offset commit, you can use the ConsumerOffsetChecker tool
> in kafka.tools.
> 
> On Sat, Feb 28, 2015 at 12:32 PM, Gene Robichaux <Ge...@match.com>
> wrote:
> 
>> I think we ZK based offset commit. However I am not certain, I would have
>> to get that from our DEV group. My role is PROD Ops.
>> 
>> Gene Robichaux
>> Manager, Database Operations
>> Match.com
>> 8300 Douglas Avenue I Suite 800 I Dallas, TX  75225
>> 
>> -----Original Message-----
>> From: Jiangjie Qin [mailto:jqin@linkedin.com.INVALID]
>> Sent: Saturday, February 28, 2015 12:06 PM
>> To: users@kafka.apache.org
>> Subject: Re: Best way to show lag?
>> 
>> Are you using Kafka based offset commit or ZK based offset commit?
>> 
>>> On 2/28/15, 6:16 AM, "Gene Robichaux" <Ge...@match.com> wrote:
>>> 
>>> What is the best way to detect consumer lag?
>>> 
>>> We are running each consumer as a separate group and I am running the
>>> ConsumerOffsetChecker to assess the partitions and the lag for each
>>> group/consumer. I run this every 5 minutes. In some cases I run this
>>> command up to 75 times on each 5 min polling cycle (once for each
>>> group/consuer). An example of the command is (bin/kafka-run-class.sh
>>> kafka.tools.ConsumerOffsetChecker --group consumer-group1 --zkconnect
>>> zkhost:zkport)
>>> 
>>> The problem I am running into is CPU usage on the broker when these
>>> commands run. We have a dedicated broker that has no leader partitions,
>>> but the high CPU still concerns me.
>>> 
>>> Is there a better way to detect consumer lag? Preferably one that is
>>> less impactful?
>>> 
>>> 
>>> Gene Robichaux
>>> Manager, Database Operations
>>> Match.com
>>> 8300 Douglas Avenue I Suite 800 I Dallas, TX  75225
> 
> 
> -- 
> -- Guozhang

Re: Best way to show lag?

Posted by Guozhang Wang <wa...@gmail.com>.
If it is ZK based offset commit, you can use the ConsumerOffsetChecker tool
in kafka.tools.

On Sat, Feb 28, 2015 at 12:32 PM, Gene Robichaux <Ge...@match.com>
wrote:

> I think we ZK based offset commit. However I am not certain, I would have
> to get that from our DEV group. My role is PROD Ops.
>
> Gene Robichaux
> Manager, Database Operations
> Match.com
> 8300 Douglas Avenue I Suite 800 I Dallas, TX  75225
>
> -----Original Message-----
> From: Jiangjie Qin [mailto:jqin@linkedin.com.INVALID]
> Sent: Saturday, February 28, 2015 12:06 PM
> To: users@kafka.apache.org
> Subject: Re: Best way to show lag?
>
> Are you using Kafka based offset commit or ZK based offset commit?
>
> On 2/28/15, 6:16 AM, "Gene Robichaux" <Ge...@match.com> wrote:
>
> >What is the best way to detect consumer lag?
> >
> >We are running each consumer as a separate group and I am running the
> >ConsumerOffsetChecker to assess the partitions and the lag for each
> >group/consumer. I run this every 5 minutes. In some cases I run this
> >command up to 75 times on each 5 min polling cycle (once for each
> >group/consuer). An example of the command is (bin/kafka-run-class.sh
> >kafka.tools.ConsumerOffsetChecker --group consumer-group1 --zkconnect
> >zkhost:zkport)
> >
> >The problem I am running into is CPU usage on the broker when these
> >commands run. We have a dedicated broker that has no leader partitions,
> >but the high CPU still concerns me.
> >
> >Is there a better way to detect consumer lag? Preferably one that is
> >less impactful?
> >
> >
> >Gene Robichaux
> >Manager, Database Operations
> >Match.com
> >8300 Douglas Avenue I Suite 800 I Dallas, TX  75225
> >
>
>


-- 
-- Guozhang

RE: Best way to show lag?

Posted by Gene Robichaux <Ge...@match.com>.
I think we ZK based offset commit. However I am not certain, I would have to get that from our DEV group. My role is PROD Ops.

Gene Robichaux
Manager, Database Operations
Match.com
8300 Douglas Avenue I Suite 800 I Dallas, TX  75225

-----Original Message-----
From: Jiangjie Qin [mailto:jqin@linkedin.com.INVALID] 
Sent: Saturday, February 28, 2015 12:06 PM
To: users@kafka.apache.org
Subject: Re: Best way to show lag?

Are you using Kafka based offset commit or ZK based offset commit?

On 2/28/15, 6:16 AM, "Gene Robichaux" <Ge...@match.com> wrote:

>What is the best way to detect consumer lag?
>
>We are running each consumer as a separate group and I am running the 
>ConsumerOffsetChecker to assess the partitions and the lag for each 
>group/consumer. I run this every 5 minutes. In some cases I run this 
>command up to 75 times on each 5 min polling cycle (once for each 
>group/consuer). An example of the command is (bin/kafka-run-class.sh 
>kafka.tools.ConsumerOffsetChecker --group consumer-group1 --zkconnect
>zkhost:zkport)
>
>The problem I am running into is CPU usage on the broker when these 
>commands run. We have a dedicated broker that has no leader partitions, 
>but the high CPU still concerns me.
>
>Is there a better way to detect consumer lag? Preferably one that is 
>less impactful?
>
>
>Gene Robichaux
>Manager, Database Operations
>Match.com
>8300 Douglas Avenue I Suite 800 I Dallas, TX  75225
>


Re: Best way to show lag?

Posted by Jiangjie Qin <jq...@linkedin.com.INVALID>.
Are you using Kafka based offset commit or ZK based offset commit?

On 2/28/15, 6:16 AM, "Gene Robichaux" <Ge...@match.com> wrote:

>What is the best way to detect consumer lag?
>
>We are running each consumer as a separate group and I am running the
>ConsumerOffsetChecker to assess the partitions and the lag for each
>group/consumer. I run this every 5 minutes. In some cases I run this
>command up to 75 times on each 5 min polling cycle (once for each
>group/consuer). An example of the command is (bin/kafka-run-class.sh
>kafka.tools.ConsumerOffsetChecker --group consumer-group1 --zkconnect
>zkhost:zkport)
>
>The problem I am running into is CPU usage on the broker when these
>commands run. We have a dedicated broker that has no leader partitions,
>but the high CPU still concerns me.
>
>Is there a better way to detect consumer lag? Preferably one that is less
>impactful?
>
>
>Gene Robichaux
>Manager, Database Operations
>Match.com
>8300 Douglas Avenue I Suite 800 I Dallas, TX  75225
>


Re: Best way to show lag?

Posted by Otis Gospodnetic <ot...@gmail.com>.
Hi,

On Sat, Feb 28, 2015 at 9:16 AM, Gene Robichaux <Ge...@match.com>
wrote:

> What is the best way to detect consumer lag?
>
> We are running each consumer as a separate group and I am running the
> ConsumerOffsetChecker to assess the partitions and the lag for each
> group/consumer. I run this every 5 minutes. In some cases I run this
> command up to 75 times on each 5 min polling cycle (once for each
> group/consuer). An example of the command is (bin/kafka-run-class.sh
> kafka.tools.ConsumerOffsetChecker --group consumer-group1 --zkconnect
> zkhost:zkport)
>
> The problem I am running into is CPU usage on the broker when these
> commands run. We have a dedicated broker that has no leader partitions, but
> the high CPU still concerns me.
>
> Is there a better way to detect consumer lag? Preferably one that is less
> impactful?
>

Yeah, that hurts :(.  I just looked at our SPM for Kafka monitoring to see
specifically what we do for Consumer Lag.  I'd send you the screenshot, but
I think the ML blocks it.  Ah, ah, you can actually see it in a demo,
here's the link:  https://apps.sematext.com/demo -- look for SPM apps with
"Kafka" in the name and look for a tab on the left side labeled "Consumer
Lag".

But basically, you can slice and dice consumer lag by any combination of
the following:
* consumer hostname
* client ID
* topic
* partition

Minimal impact and you get your Consumer Lag in more or less RT.

Otis
--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/