You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Marcus Horsley-Rai <ma...@gmail.com> on 2021/06/16 13:51:55 UTC

RemoteTimeMs metric

I've recently implemented further monitoring of our Kafka cluster to hone
in on where I think we have bottlenecks.
I'm interested in one metric in particular:
*kafka.network:type=RequestMetrics,name=RemoteTimeMs,request={Produce|FetchConsumer|FetchFollower}*

All the docs I've seen accompanying the metric state "non-zero for produce
requests when ack=-1".
What does it mean however in relation to consume requests (FetchConsumer),
or follower requests (FetchFollower)?

On my cluster - the TotalTimeMs is nice and low for produce requests, which
I would expect as we don't set a high acks value.
For follower and consume requests however, TotalTimeMs is nearly 500ms in
the 99th percentile, of which the RemoteTimeMS is the vast proportion.

My gut is telling me that followers are struggling to replicate from
leaders fast enough, and therefore RemoteTimeMs for FetchConsumer is
telling me there is a high commit lag (waiting for all replicas in the ISR
to be updated)?

Many thanks in advance,

Marcus

Re: RemoteTimeMs metric

Posted by Marcus Horsley-Rai <ma...@gmail.com>.
Thanks, David. That is very useful information. Our data does arrive in 1
minute batches - I'll double check the history of the metric over time. If
that is the case, I would expect to see it fluctuate if I poll the metric
at a sub-minute interval

Many thanks,

Marcus

On Wed, 16 Jun 2021, 17:39 David Ballano Fernandez, <
dfernandez@demonware.net> wrote:

> Hi Marcus,
>
> For fetch requests, if the remote time is high, it could be that there is
> not enough data to give in a fetch response. This can happen when the
> consumer or replica is caught up and there is no new incoming data. If this
> is the case, remote time will be close to the max wait time, which is
> normal.
>
> I have seen this when my clusters are idle and I am not sending data to
> them
> hope this helps.
>
>
>
> On Wed, Jun 16, 2021 at 6:53 AM Marcus Horsley-Rai <ma...@gmail.com>
> wrote:
>
> > I've recently implemented further monitoring of our Kafka cluster to hone
> > in on where I think we have bottlenecks.
> > I'm interested in one metric in particular:
> >
> >
> *kafka.network:type=RequestMetrics,name=RemoteTimeMs,request={Produce|FetchConsumer|FetchFollower}*
> >
> > All the docs I've seen accompanying the metric state "non-zero for
> produce
> > requests when ack=-1".
> > What does it mean however in relation to consume requests
> (FetchConsumer),
> > or follower requests (FetchFollower)?
> >
> > On my cluster - the TotalTimeMs is nice and low for produce requests,
> which
> > I would expect as we don't set a high acks value.
> > For follower and consume requests however, TotalTimeMs is nearly 500ms in
> > the 99th percentile, of which the RemoteTimeMS is the vast proportion.
> >
> > My gut is telling me that followers are struggling to replicate from
> > leaders fast enough, and therefore RemoteTimeMs for FetchConsumer is
> > telling me there is a high commit lag (waiting for all replicas in the
> ISR
> > to be updated)?
> >
> > Many thanks in advance,
> >
> > Marcus
> >
>

Re: RemoteTimeMs metric

Posted by David Ballano Fernandez <df...@demonware.net>.
Hi Marcus,

For fetch requests, if the remote time is high, it could be that there is
not enough data to give in a fetch response. This can happen when the
consumer or replica is caught up and there is no new incoming data. If this
is the case, remote time will be close to the max wait time, which is
normal.

I have seen this when my clusters are idle and I am not sending data to them
hope this helps.



On Wed, Jun 16, 2021 at 6:53 AM Marcus Horsley-Rai <ma...@gmail.com>
wrote:

> I've recently implemented further monitoring of our Kafka cluster to hone
> in on where I think we have bottlenecks.
> I'm interested in one metric in particular:
>
> *kafka.network:type=RequestMetrics,name=RemoteTimeMs,request={Produce|FetchConsumer|FetchFollower}*
>
> All the docs I've seen accompanying the metric state "non-zero for produce
> requests when ack=-1".
> What does it mean however in relation to consume requests (FetchConsumer),
> or follower requests (FetchFollower)?
>
> On my cluster - the TotalTimeMs is nice and low for produce requests, which
> I would expect as we don't set a high acks value.
> For follower and consume requests however, TotalTimeMs is nearly 500ms in
> the 99th percentile, of which the RemoteTimeMS is the vast proportion.
>
> My gut is telling me that followers are struggling to replicate from
> leaders fast enough, and therefore RemoteTimeMs for FetchConsumer is
> telling me there is a high commit lag (waiting for all replicas in the ISR
> to be updated)?
>
> Many thanks in advance,
>
> Marcus
>