You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by tao xiao <xi...@gmail.com> on 2015/02/13 10:03:28 UTC

consumer lag metric

Hi team,

Is there a metric that shows the consumer lag of a particular consumer
group? similar to what offset checker provides

-- 
Regards,
Tao

Re: consumer lag metric

Posted by tao xiao <xi...@gmail.com>.
Thanks Todd. that will work

On Tue, Feb 17, 2015 at 10:31 PM, Todd Palino <tp...@gmail.com> wrote:

> In order to do that, you'll need to run it and parse the output, and then
> emit it to your metrics system of choice. This is essentially what I do - I
> have a monitoring application which runs every minute and pulls the offsets
> for a select set of topics and consumers, and then packages up the metrics
> and sends them to our internal system.
>
> It's not ideal. We're working on a script to calculate lag efficiently for
> all consumers who commit offsets to Kafka, rather than a select set.
>
> -Todd
>
>
> On Mon, Feb 16, 2015 at 12:27 AM, tao xiao <xi...@gmail.com> wrote:
>
> > Thank you Todd for your detailed explanation. Currently I export all
> > metrics to graphite using the reporter configuration. is there a way I
> can
> > do similar thing with offset checker?
> >
> > On Mon, Feb 16, 2015 at 4:21 PM, Todd Palino <tp...@gmail.com> wrote:
> >
> > > The reason for this is the mechanic by which each of the lags are
> > > calculated. MaxLag (and the FetcherLagMetric) are calculated by the
> > > consumer itself using the difference between the offset it knows it is
> > at,
> > > and the offset that the broker has as the end of the partition. The
> > offset
> > > checker, however, uses the last offset that the consumer committed.
> > > Depending on your configuration, this is somewhere behind where the
> > > consumer actually is. For example, if your commit interval is set to 10
> > > minutes, the number used by the offset checker can be up to 10 minutes
> > > behind where it actually is.
> > >
> > > So while MaxLag may be more up to date at any given time, it's actually
> > > less accurate. Because MaxLag relies on the consumer to report it, if
> the
> > > consumer breaks, you will not see an accurate lag number. This is why
> > when
> > > we are checking consumer lag, we use an external process that uses the
> > > committed consumer offsets. This allows us to catch a broken consumer,
> as
> > > well as an active consumer that is just falling behind.
> > >
> > > -Todd
> > >
> > >
> > > On Fri, Feb 13, 2015 at 9:34 PM, tao xiao <xi...@gmail.com>
> wrote:
> > >
> > > > Thanks Joel. But I discover that both MaxLag and FetcherLagMetrics
> are
> > > > always
> > > > much smaller than the lag shown in offset checker. any reason?
> > > >
> > > > On Sat, Feb 14, 2015 at 7:22 AM, Joel Koshy <jj...@gmail.com>
> > wrote:
> > > >
> > > > > There are FetcherLagMetrics that you can take a look at. However,
> it
> > > > > is probably easiest to just monitor MaxLag as that reports the
> > maximum
> > > > > of all the lag metrics.
> > > > >
> > > > > On Fri, Feb 13, 2015 at 05:03:28PM +0800, tao xiao wrote:
> > > > > > Hi team,
> > > > > >
> > > > > > Is there a metric that shows the consumer lag of a particular
> > > consumer
> > > > > > group? similar to what offset checker provides
> > > > > >
> > > > > > --
> > > > > > Regards,
> > > > > > Tao
> > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > Regards,
> > > > Tao
> > > >
> > >
> >
> >
> >
> > --
> > Regards,
> > Tao
> >
>



-- 
Regards,
Tao

Re: consumer lag metric

Posted by Todd Palino <tp...@gmail.com>.
In order to do that, you'll need to run it and parse the output, and then
emit it to your metrics system of choice. This is essentially what I do - I
have a monitoring application which runs every minute and pulls the offsets
for a select set of topics and consumers, and then packages up the metrics
and sends them to our internal system.

It's not ideal. We're working on a script to calculate lag efficiently for
all consumers who commit offsets to Kafka, rather than a select set.

-Todd


On Mon, Feb 16, 2015 at 12:27 AM, tao xiao <xi...@gmail.com> wrote:

> Thank you Todd for your detailed explanation. Currently I export all
> metrics to graphite using the reporter configuration. is there a way I can
> do similar thing with offset checker?
>
> On Mon, Feb 16, 2015 at 4:21 PM, Todd Palino <tp...@gmail.com> wrote:
>
> > The reason for this is the mechanic by which each of the lags are
> > calculated. MaxLag (and the FetcherLagMetric) are calculated by the
> > consumer itself using the difference between the offset it knows it is
> at,
> > and the offset that the broker has as the end of the partition. The
> offset
> > checker, however, uses the last offset that the consumer committed.
> > Depending on your configuration, this is somewhere behind where the
> > consumer actually is. For example, if your commit interval is set to 10
> > minutes, the number used by the offset checker can be up to 10 minutes
> > behind where it actually is.
> >
> > So while MaxLag may be more up to date at any given time, it's actually
> > less accurate. Because MaxLag relies on the consumer to report it, if the
> > consumer breaks, you will not see an accurate lag number. This is why
> when
> > we are checking consumer lag, we use an external process that uses the
> > committed consumer offsets. This allows us to catch a broken consumer, as
> > well as an active consumer that is just falling behind.
> >
> > -Todd
> >
> >
> > On Fri, Feb 13, 2015 at 9:34 PM, tao xiao <xi...@gmail.com> wrote:
> >
> > > Thanks Joel. But I discover that both MaxLag and FetcherLagMetrics are
> > > always
> > > much smaller than the lag shown in offset checker. any reason?
> > >
> > > On Sat, Feb 14, 2015 at 7:22 AM, Joel Koshy <jj...@gmail.com>
> wrote:
> > >
> > > > There are FetcherLagMetrics that you can take a look at. However, it
> > > > is probably easiest to just monitor MaxLag as that reports the
> maximum
> > > > of all the lag metrics.
> > > >
> > > > On Fri, Feb 13, 2015 at 05:03:28PM +0800, tao xiao wrote:
> > > > > Hi team,
> > > > >
> > > > > Is there a metric that shows the consumer lag of a particular
> > consumer
> > > > > group? similar to what offset checker provides
> > > > >
> > > > > --
> > > > > Regards,
> > > > > Tao
> > > >
> > > >
> > >
> > >
> > > --
> > > Regards,
> > > Tao
> > >
> >
>
>
>
> --
> Regards,
> Tao
>

Re: consumer lag metric

Posted by tao xiao <xi...@gmail.com>.
Thank you Todd for your detailed explanation. Currently I export all
metrics to graphite using the reporter configuration. is there a way I can
do similar thing with offset checker?

On Mon, Feb 16, 2015 at 4:21 PM, Todd Palino <tp...@gmail.com> wrote:

> The reason for this is the mechanic by which each of the lags are
> calculated. MaxLag (and the FetcherLagMetric) are calculated by the
> consumer itself using the difference between the offset it knows it is at,
> and the offset that the broker has as the end of the partition. The offset
> checker, however, uses the last offset that the consumer committed.
> Depending on your configuration, this is somewhere behind where the
> consumer actually is. For example, if your commit interval is set to 10
> minutes, the number used by the offset checker can be up to 10 minutes
> behind where it actually is.
>
> So while MaxLag may be more up to date at any given time, it's actually
> less accurate. Because MaxLag relies on the consumer to report it, if the
> consumer breaks, you will not see an accurate lag number. This is why when
> we are checking consumer lag, we use an external process that uses the
> committed consumer offsets. This allows us to catch a broken consumer, as
> well as an active consumer that is just falling behind.
>
> -Todd
>
>
> On Fri, Feb 13, 2015 at 9:34 PM, tao xiao <xi...@gmail.com> wrote:
>
> > Thanks Joel. But I discover that both MaxLag and FetcherLagMetrics are
> > always
> > much smaller than the lag shown in offset checker. any reason?
> >
> > On Sat, Feb 14, 2015 at 7:22 AM, Joel Koshy <jj...@gmail.com> wrote:
> >
> > > There are FetcherLagMetrics that you can take a look at. However, it
> > > is probably easiest to just monitor MaxLag as that reports the maximum
> > > of all the lag metrics.
> > >
> > > On Fri, Feb 13, 2015 at 05:03:28PM +0800, tao xiao wrote:
> > > > Hi team,
> > > >
> > > > Is there a metric that shows the consumer lag of a particular
> consumer
> > > > group? similar to what offset checker provides
> > > >
> > > > --
> > > > Regards,
> > > > Tao
> > >
> > >
> >
> >
> > --
> > Regards,
> > Tao
> >
>



-- 
Regards,
Tao

Re: consumer lag metric

Posted by Todd Palino <tp...@gmail.com>.
The reason for this is the mechanic by which each of the lags are
calculated. MaxLag (and the FetcherLagMetric) are calculated by the
consumer itself using the difference between the offset it knows it is at,
and the offset that the broker has as the end of the partition. The offset
checker, however, uses the last offset that the consumer committed.
Depending on your configuration, this is somewhere behind where the
consumer actually is. For example, if your commit interval is set to 10
minutes, the number used by the offset checker can be up to 10 minutes
behind where it actually is.

So while MaxLag may be more up to date at any given time, it's actually
less accurate. Because MaxLag relies on the consumer to report it, if the
consumer breaks, you will not see an accurate lag number. This is why when
we are checking consumer lag, we use an external process that uses the
committed consumer offsets. This allows us to catch a broken consumer, as
well as an active consumer that is just falling behind.

-Todd


On Fri, Feb 13, 2015 at 9:34 PM, tao xiao <xi...@gmail.com> wrote:

> Thanks Joel. But I discover that both MaxLag and FetcherLagMetrics are
> always
> much smaller than the lag shown in offset checker. any reason?
>
> On Sat, Feb 14, 2015 at 7:22 AM, Joel Koshy <jj...@gmail.com> wrote:
>
> > There are FetcherLagMetrics that you can take a look at. However, it
> > is probably easiest to just monitor MaxLag as that reports the maximum
> > of all the lag metrics.
> >
> > On Fri, Feb 13, 2015 at 05:03:28PM +0800, tao xiao wrote:
> > > Hi team,
> > >
> > > Is there a metric that shows the consumer lag of a particular consumer
> > > group? similar to what offset checker provides
> > >
> > > --
> > > Regards,
> > > Tao
> >
> >
>
>
> --
> Regards,
> Tao
>

Re: consumer lag metric

Posted by tao xiao <xi...@gmail.com>.
Thanks Joel. But I discover that both MaxLag and FetcherLagMetrics are always
much smaller than the lag shown in offset checker. any reason?

On Sat, Feb 14, 2015 at 7:22 AM, Joel Koshy <jj...@gmail.com> wrote:

> There are FetcherLagMetrics that you can take a look at. However, it
> is probably easiest to just monitor MaxLag as that reports the maximum
> of all the lag metrics.
>
> On Fri, Feb 13, 2015 at 05:03:28PM +0800, tao xiao wrote:
> > Hi team,
> >
> > Is there a metric that shows the consumer lag of a particular consumer
> > group? similar to what offset checker provides
> >
> > --
> > Regards,
> > Tao
>
>


-- 
Regards,
Tao

Re: consumer lag metric

Posted by Joel Koshy <jj...@gmail.com>.
There are FetcherLagMetrics that you can take a look at. However, it
is probably easiest to just monitor MaxLag as that reports the maximum
of all the lag metrics.

On Fri, Feb 13, 2015 at 05:03:28PM +0800, tao xiao wrote:
> Hi team,
> 
> Is there a metric that shows the consumer lag of a particular consumer
> group? similar to what offset checker provides
> 
> -- 
> Regards,
> Tao