You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Dinesh Kumar <de...@gmail.com> on 2019/08/22 05:18:24 UTC

Kafka Cluster High Produce Time

Hi,

We've a kafka (version 2.0.0) cluster with multiple brokers, and many
producers with ack=all, or could be ack=1 (which we don't control), There's
increase in produce time from 10ms to ~150ms.

With JMX metrics able to see "remote" is taking more time, which i figured
are followers.

1. Is there any configuration we could tweak to reduce the produce time
2. What's the next step to say debug why remote produce time is high.


Thanks,
Dinesh Kumar

Re: Kafka Cluster High Produce Time

Posted by Lisheng Wang <wa...@gmail.com>.
Hi Dinesh

Maybe you can check
"kafka.network:type=RequestMetrics,name=RemoteTimeMs,request=FetchFollower"
on all broker to see if there are some broker are lower than others?

i think if some followers are busy on replicating, then that metric will be
lower since maybe there are many records are waiting be replicated so that
follower will not wait util to reach "replica.fetch.wait.max.ms".

Not sure assumption is correct. what do you think?

Best,
Lisheng


Dinesh Kumar <de...@gmail.com> 于2019年8月22日周四 下午2:00写道:

> Hi Lisheng,
>
> Yes, its RemoteTimeMs,
> "kafka.network:type=RequestMetrics,name=RemoteTimeMs,request=Produce"
>
> Sure, i'll try increasing the number of replica fetchers, other
> configuration are as suggested by the paper,
>
> I was also wondering whether i can track which topic or which specific
> follower is causing the issue (if it's network) since we' ve brokers across
> different regions.
>
> Thanks,
> Dinesh Kumar
>
> On Thu, Aug 22, 2019 at 11:14 AM Lisheng Wang <wa...@gmail.com>
> wrote:
>
> > Hi Dinesh
> >
> > Just wanna check if  the metrics you called is "RemoteTimeMs" or not?
> >
> > if so, The meaning of  "RemoteTimeMs" is the time the request is waiting
> on
> > a remote client for produce. A high value can imply a slow network
> > connection.
> >
> > that explanation come from "Optimizing Your Apache KafkaTM Deployment"
> > which can be download at
> >
> >
> https://www.confluent.io/white-paper/optimizing-your-apache-kafka-deployment/
> >
> > so i think you need focus on your network to see if it's a bottleneck.
> >
> > Hope that helps.
> >
> > Best,
> > Lisheng
> >
> >
> > Dinesh Kumar <de...@gmail.com> 于2019年8月22日周四 下午1:18写道:
> >
> > > Hi,
> > >
> > > We've a kafka (version 2.0.0) cluster with multiple brokers, and many
> > > producers with ack=all, or could be ack=1 (which we don't control),
> > There's
> > > increase in produce time from 10ms to ~150ms.
> > >
> > > With JMX metrics able to see "remote" is taking more time, which i
> > figured
> > > are followers.
> > >
> > > 1. Is there any configuration we could tweak to reduce the produce time
> > > 2. What's the next step to say debug why remote produce time is high.
> > >
> > >
> > > Thanks,
> > > Dinesh Kumar
> > >
> >
>

Re: Kafka Cluster High Produce Time

Posted by Dinesh Kumar <de...@gmail.com>.
Hi Lisheng,

Yes, its RemoteTimeMs,
"kafka.network:type=RequestMetrics,name=RemoteTimeMs,request=Produce"

Sure, i'll try increasing the number of replica fetchers, other
configuration are as suggested by the paper,

I was also wondering whether i can track which topic or which specific
follower is causing the issue (if it's network) since we' ve brokers across
different regions.

Thanks,
Dinesh Kumar

On Thu, Aug 22, 2019 at 11:14 AM Lisheng Wang <wa...@gmail.com>
wrote:

> Hi Dinesh
>
> Just wanna check if  the metrics you called is "RemoteTimeMs" or not?
>
> if so, The meaning of  "RemoteTimeMs" is the time the request is waiting on
> a remote client for produce. A high value can imply a slow network
> connection.
>
> that explanation come from "Optimizing Your Apache KafkaTM Deployment"
> which can be download at
>
> https://www.confluent.io/white-paper/optimizing-your-apache-kafka-deployment/
>
> so i think you need focus on your network to see if it's a bottleneck.
>
> Hope that helps.
>
> Best,
> Lisheng
>
>
> Dinesh Kumar <de...@gmail.com> 于2019年8月22日周四 下午1:18写道:
>
> > Hi,
> >
> > We've a kafka (version 2.0.0) cluster with multiple brokers, and many
> > producers with ack=all, or could be ack=1 (which we don't control),
> There's
> > increase in produce time from 10ms to ~150ms.
> >
> > With JMX metrics able to see "remote" is taking more time, which i
> figured
> > are followers.
> >
> > 1. Is there any configuration we could tweak to reduce the produce time
> > 2. What's the next step to say debug why remote produce time is high.
> >
> >
> > Thanks,
> > Dinesh Kumar
> >
>

Re: Kafka Cluster High Produce Time

Posted by Lisheng Wang <wa...@gmail.com>.
Hi Dinesh

Just wanna check if  the metrics you called is "RemoteTimeMs" or not?

if so, The meaning of  "RemoteTimeMs" is the time the request is waiting on
a remote client for produce. A high value can imply a slow network
connection.

that explanation come from "Optimizing Your Apache KafkaTM Deployment"
which can be download at
https://www.confluent.io/white-paper/optimizing-your-apache-kafka-deployment/

so i think you need focus on your network to see if it's a bottleneck.

Hope that helps.

Best,
Lisheng


Dinesh Kumar <de...@gmail.com> 于2019年8月22日周四 下午1:18写道:

> Hi,
>
> We've a kafka (version 2.0.0) cluster with multiple brokers, and many
> producers with ack=all, or could be ack=1 (which we don't control), There's
> increase in produce time from 10ms to ~150ms.
>
> With JMX metrics able to see "remote" is taking more time, which i figured
> are followers.
>
> 1. Is there any configuration we could tweak to reduce the produce time
> 2. What's the next step to say debug why remote produce time is high.
>
>
> Thanks,
> Dinesh Kumar
>