You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Gaurav Abbi <ab...@gmail.com> on 2017/07/31 15:04:18 UTC

increased response time for OffsetCommit requests

Hi All,
We recently upgraded to Kafka 0.11.0.0 from 0.10.1.1.
Since then we have been observing increased latencies especially
OffsetCommit requests.
Looking at the server side metrics, it seems the culprit is the Follower
time.

We are using following
inter.broker.protocol.version: 0.11.0.0
log.message.format.version: 0.9.0.1

Are there some possible pointers that we can explore to troubleshoot the
root cause?

Best Regards,
Gaurav Abbi

Re: increased response time for OffsetCommit requests

Posted by Apurva Mehta <ap...@confluent.io>.
Hi Gaurav, those results are definitely inconsistent with the benchmarking
we did. Can you see if this reproduces with the 0.10.0 message format
running with 0.11.0.0 broker?

On Wed, Aug 2, 2017 at 4:52 AM, Gaurav Abbi <ab...@gmail.com> wrote:

> Hi Apurva,
> For the ProduceRequest,
>
>    - The increase is from 470 ms to around 1.004 s.
>    - The average batch size (batch-size-avg) is around 320B.
>    - The linger time is 10ms.
>
> However, the 99th percentile for OffsetCommit has increased from 1.08 to
> 2.8 seconds.
>
> Best Regards,
> Gaurav Abbi
>
> On Tue, Aug 1, 2017 at 7:32 PM, Apurva Mehta <ap...@confluent.io> wrote:
>
> > Sorry to keep prodding you with questions, but can you quantify the
> > increase for the ProduceRequest? What is the workload you are testing
> > against: specificallly the batch size, message size, linger time settings
> > of the producers in question?
> >
> > I ask because we benchmarked 0.11.0 against the older 0.10.0 message
> format
> > and found no difference in performance between an 0.10.2 on the 0.10
> > message format and 0.11.0 on the 0.10 message format.  Could you create a
> > topic with the 0.10.0 message format and see if there is any degradation
> > for the same workload?
> >
> > Thanks,
> > Apurva
> >
> >
> > On Tue, Aug 1, 2017 at 2:51 AM, Gaurav Abbi <ab...@gmail.com>
> wrote:
> >
> > > Hi Apurva,
> > > There are increases in the *Produce* request also. It is not as
> > substantial
> > > as compared to *OffsetCommit. *For both of these requests, the major
> > > contributor is Remote time.
> > > A couple of other metrics that show different behavior post upgrade:
> > >
> > >    1. *LogStartOffset*: It has drastically decreased.
> > >    2. *NumDelayedOperations: *It has dropped.
> > >
> > > These could be related or may be these are intended good changes in
> Kafka
> > > 0.11.0.0 or one of the previous versions.
> > >
> > > Best Regards,
> > > Gaurav Abbi
> > >
> > > On Tue, Aug 1, 2017 at 12:11 AM, Apurva Mehta <ap...@confluent.io>
> > wrote:
> > >
> > > > Thanks for your response. Is it 200% only for the
> OffsetCommitRequest,
> > or
> > > > is it similar for all the requests?
> > > >
> > > >
> > > > On Mon, Jul 31, 2017 at 12:48 PM, Gaurav Abbi <abbi.gaurav@gmail.com
> >
> > > > wrote:
> > > >
> > > > > Hi Apurva,
> > > > > 1. The increase is about 200%.
> > > > > 2. There is no increase in throughput. However,  this has caused in
> > > error
> > > > > rate and a decrease in the responses received per second.
> > > > >
> > > > >
> > > > > One more thing to mention, we also upgraded to 0.11.0.0 client
> > > libraries.
> > > > > We are currently using old Producer and consumer APIs.
> > > > >
> > > > >
> > > > >
> > > > > Best Regards,
> > > > > Gaurav Abbi
> > > > >
> > > > > On Mon, Jul 31, 2017 at 7:46 PM, Apurva Mehta <apurva@confluent.io
> >
> > > > wrote:
> > > > >
> > > > > > How much is the increase? Is there any increase in throughput?
> > > > > >
> > > > > > On Mon, Jul 31, 2017 at 8:04 AM, Gaurav Abbi <
> > abbi.gaurav@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi All,
> > > > > > > We recently upgraded to Kafka 0.11.0.0 from 0.10.1.1.
> > > > > > > Since then we have been observing increased latencies
> especially
> > > > > > > OffsetCommit requests.
> > > > > > > Looking at the server side metrics, it seems the culprit is the
> > > > > Follower
> > > > > > > time.
> > > > > > >
> > > > > > > We are using following
> > > > > > > inter.broker.protocol.version: 0.11.0.0
> > > > > > > log.message.format.version: 0.9.0.1
> > > > > > >
> > > > > > > Are there some possible pointers that we can explore to
> > > troubleshoot
> > > > > the
> > > > > > > root cause?
> > > > > > >
> > > > > > > Best Regards,
> > > > > > > Gaurav Abbi
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: increased response time for OffsetCommit requests

Posted by Gaurav Abbi <ab...@gmail.com>.
Hi Apurva,
For the ProduceRequest,

   - The increase is from 470 ms to around 1.004 s.
   - The average batch size (batch-size-avg) is around 320B.
   - The linger time is 10ms.

However, the 99th percentile for OffsetCommit has increased from 1.08 to
2.8 seconds.

Best Regards,
Gaurav Abbi

On Tue, Aug 1, 2017 at 7:32 PM, Apurva Mehta <ap...@confluent.io> wrote:

> Sorry to keep prodding you with questions, but can you quantify the
> increase for the ProduceRequest? What is the workload you are testing
> against: specificallly the batch size, message size, linger time settings
> of the producers in question?
>
> I ask because we benchmarked 0.11.0 against the older 0.10.0 message format
> and found no difference in performance between an 0.10.2 on the 0.10
> message format and 0.11.0 on the 0.10 message format.  Could you create a
> topic with the 0.10.0 message format and see if there is any degradation
> for the same workload?
>
> Thanks,
> Apurva
>
>
> On Tue, Aug 1, 2017 at 2:51 AM, Gaurav Abbi <ab...@gmail.com> wrote:
>
> > Hi Apurva,
> > There are increases in the *Produce* request also. It is not as
> substantial
> > as compared to *OffsetCommit. *For both of these requests, the major
> > contributor is Remote time.
> > A couple of other metrics that show different behavior post upgrade:
> >
> >    1. *LogStartOffset*: It has drastically decreased.
> >    2. *NumDelayedOperations: *It has dropped.
> >
> > These could be related or may be these are intended good changes in Kafka
> > 0.11.0.0 or one of the previous versions.
> >
> > Best Regards,
> > Gaurav Abbi
> >
> > On Tue, Aug 1, 2017 at 12:11 AM, Apurva Mehta <ap...@confluent.io>
> wrote:
> >
> > > Thanks for your response. Is it 200% only for the OffsetCommitRequest,
> or
> > > is it similar for all the requests?
> > >
> > >
> > > On Mon, Jul 31, 2017 at 12:48 PM, Gaurav Abbi <ab...@gmail.com>
> > > wrote:
> > >
> > > > Hi Apurva,
> > > > 1. The increase is about 200%.
> > > > 2. There is no increase in throughput. However,  this has caused in
> > error
> > > > rate and a decrease in the responses received per second.
> > > >
> > > >
> > > > One more thing to mention, we also upgraded to 0.11.0.0 client
> > libraries.
> > > > We are currently using old Producer and consumer APIs.
> > > >
> > > >
> > > >
> > > > Best Regards,
> > > > Gaurav Abbi
> > > >
> > > > On Mon, Jul 31, 2017 at 7:46 PM, Apurva Mehta <ap...@confluent.io>
> > > wrote:
> > > >
> > > > > How much is the increase? Is there any increase in throughput?
> > > > >
> > > > > On Mon, Jul 31, 2017 at 8:04 AM, Gaurav Abbi <
> abbi.gaurav@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hi All,
> > > > > > We recently upgraded to Kafka 0.11.0.0 from 0.10.1.1.
> > > > > > Since then we have been observing increased latencies especially
> > > > > > OffsetCommit requests.
> > > > > > Looking at the server side metrics, it seems the culprit is the
> > > > Follower
> > > > > > time.
> > > > > >
> > > > > > We are using following
> > > > > > inter.broker.protocol.version: 0.11.0.0
> > > > > > log.message.format.version: 0.9.0.1
> > > > > >
> > > > > > Are there some possible pointers that we can explore to
> > troubleshoot
> > > > the
> > > > > > root cause?
> > > > > >
> > > > > > Best Regards,
> > > > > > Gaurav Abbi
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: increased response time for OffsetCommit requests

Posted by Apurva Mehta <ap...@confluent.io>.
Sorry to keep prodding you with questions, but can you quantify the
increase for the ProduceRequest? What is the workload you are testing
against: specificallly the batch size, message size, linger time settings
of the producers in question?

I ask because we benchmarked 0.11.0 against the older 0.10.0 message format
and found no difference in performance between an 0.10.2 on the 0.10
message format and 0.11.0 on the 0.10 message format.  Could you create a
topic with the 0.10.0 message format and see if there is any degradation
for the same workload?

Thanks,
Apurva


On Tue, Aug 1, 2017 at 2:51 AM, Gaurav Abbi <ab...@gmail.com> wrote:

> Hi Apurva,
> There are increases in the *Produce* request also. It is not as substantial
> as compared to *OffsetCommit. *For both of these requests, the major
> contributor is Remote time.
> A couple of other metrics that show different behavior post upgrade:
>
>    1. *LogStartOffset*: It has drastically decreased.
>    2. *NumDelayedOperations: *It has dropped.
>
> These could be related or may be these are intended good changes in Kafka
> 0.11.0.0 or one of the previous versions.
>
> Best Regards,
> Gaurav Abbi
>
> On Tue, Aug 1, 2017 at 12:11 AM, Apurva Mehta <ap...@confluent.io> wrote:
>
> > Thanks for your response. Is it 200% only for the OffsetCommitRequest, or
> > is it similar for all the requests?
> >
> >
> > On Mon, Jul 31, 2017 at 12:48 PM, Gaurav Abbi <ab...@gmail.com>
> > wrote:
> >
> > > Hi Apurva,
> > > 1. The increase is about 200%.
> > > 2. There is no increase in throughput. However,  this has caused in
> error
> > > rate and a decrease in the responses received per second.
> > >
> > >
> > > One more thing to mention, we also upgraded to 0.11.0.0 client
> libraries.
> > > We are currently using old Producer and consumer APIs.
> > >
> > >
> > >
> > > Best Regards,
> > > Gaurav Abbi
> > >
> > > On Mon, Jul 31, 2017 at 7:46 PM, Apurva Mehta <ap...@confluent.io>
> > wrote:
> > >
> > > > How much is the increase? Is there any increase in throughput?
> > > >
> > > > On Mon, Jul 31, 2017 at 8:04 AM, Gaurav Abbi <ab...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi All,
> > > > > We recently upgraded to Kafka 0.11.0.0 from 0.10.1.1.
> > > > > Since then we have been observing increased latencies especially
> > > > > OffsetCommit requests.
> > > > > Looking at the server side metrics, it seems the culprit is the
> > > Follower
> > > > > time.
> > > > >
> > > > > We are using following
> > > > > inter.broker.protocol.version: 0.11.0.0
> > > > > log.message.format.version: 0.9.0.1
> > > > >
> > > > > Are there some possible pointers that we can explore to
> troubleshoot
> > > the
> > > > > root cause?
> > > > >
> > > > > Best Regards,
> > > > > Gaurav Abbi
> > > > >
> > > >
> > >
> >
>

Re: increased response time for OffsetCommit requests

Posted by Gaurav Abbi <ab...@gmail.com>.
Hi Apurva,
There are increases in the *Produce* request also. It is not as substantial
as compared to *OffsetCommit. *For both of these requests, the major
contributor is Remote time.
A couple of other metrics that show different behavior post upgrade:

   1. *LogStartOffset*: It has drastically decreased.
   2. *NumDelayedOperations: *It has dropped.

These could be related or may be these are intended good changes in Kafka
0.11.0.0 or one of the previous versions.

Best Regards,
Gaurav Abbi

On Tue, Aug 1, 2017 at 12:11 AM, Apurva Mehta <ap...@confluent.io> wrote:

> Thanks for your response. Is it 200% only for the OffsetCommitRequest, or
> is it similar for all the requests?
>
>
> On Mon, Jul 31, 2017 at 12:48 PM, Gaurav Abbi <ab...@gmail.com>
> wrote:
>
> > Hi Apurva,
> > 1. The increase is about 200%.
> > 2. There is no increase in throughput. However,  this has caused in error
> > rate and a decrease in the responses received per second.
> >
> >
> > One more thing to mention, we also upgraded to 0.11.0.0 client libraries.
> > We are currently using old Producer and consumer APIs.
> >
> >
> >
> > Best Regards,
> > Gaurav Abbi
> >
> > On Mon, Jul 31, 2017 at 7:46 PM, Apurva Mehta <ap...@confluent.io>
> wrote:
> >
> > > How much is the increase? Is there any increase in throughput?
> > >
> > > On Mon, Jul 31, 2017 at 8:04 AM, Gaurav Abbi <ab...@gmail.com>
> > > wrote:
> > >
> > > > Hi All,
> > > > We recently upgraded to Kafka 0.11.0.0 from 0.10.1.1.
> > > > Since then we have been observing increased latencies especially
> > > > OffsetCommit requests.
> > > > Looking at the server side metrics, it seems the culprit is the
> > Follower
> > > > time.
> > > >
> > > > We are using following
> > > > inter.broker.protocol.version: 0.11.0.0
> > > > log.message.format.version: 0.9.0.1
> > > >
> > > > Are there some possible pointers that we can explore to troubleshoot
> > the
> > > > root cause?
> > > >
> > > > Best Regards,
> > > > Gaurav Abbi
> > > >
> > >
> >
>

Re: increased response time for OffsetCommit requests

Posted by Apurva Mehta <ap...@confluent.io>.
Thanks for your response. Is it 200% only for the OffsetCommitRequest, or
is it similar for all the requests?


On Mon, Jul 31, 2017 at 12:48 PM, Gaurav Abbi <ab...@gmail.com> wrote:

> Hi Apurva,
> 1. The increase is about 200%.
> 2. There is no increase in throughput. However,  this has caused in error
> rate and a decrease in the responses received per second.
>
>
> One more thing to mention, we also upgraded to 0.11.0.0 client libraries.
> We are currently using old Producer and consumer APIs.
>
>
>
> Best Regards,
> Gaurav Abbi
>
> On Mon, Jul 31, 2017 at 7:46 PM, Apurva Mehta <ap...@confluent.io> wrote:
>
> > How much is the increase? Is there any increase in throughput?
> >
> > On Mon, Jul 31, 2017 at 8:04 AM, Gaurav Abbi <ab...@gmail.com>
> > wrote:
> >
> > > Hi All,
> > > We recently upgraded to Kafka 0.11.0.0 from 0.10.1.1.
> > > Since then we have been observing increased latencies especially
> > > OffsetCommit requests.
> > > Looking at the server side metrics, it seems the culprit is the
> Follower
> > > time.
> > >
> > > We are using following
> > > inter.broker.protocol.version: 0.11.0.0
> > > log.message.format.version: 0.9.0.1
> > >
> > > Are there some possible pointers that we can explore to troubleshoot
> the
> > > root cause?
> > >
> > > Best Regards,
> > > Gaurav Abbi
> > >
> >
>

Re: increased response time for OffsetCommit requests

Posted by Gaurav Abbi <ab...@gmail.com>.
Hi Apurva,
1. The increase is about 200%.
2. There is no increase in throughput. However,  this has caused in error
rate and a decrease in the responses received per second.


One more thing to mention, we also upgraded to 0.11.0.0 client libraries.
We are currently using old Producer and consumer APIs.



Best Regards,
Gaurav Abbi

On Mon, Jul 31, 2017 at 7:46 PM, Apurva Mehta <ap...@confluent.io> wrote:

> How much is the increase? Is there any increase in throughput?
>
> On Mon, Jul 31, 2017 at 8:04 AM, Gaurav Abbi <ab...@gmail.com>
> wrote:
>
> > Hi All,
> > We recently upgraded to Kafka 0.11.0.0 from 0.10.1.1.
> > Since then we have been observing increased latencies especially
> > OffsetCommit requests.
> > Looking at the server side metrics, it seems the culprit is the Follower
> > time.
> >
> > We are using following
> > inter.broker.protocol.version: 0.11.0.0
> > log.message.format.version: 0.9.0.1
> >
> > Are there some possible pointers that we can explore to troubleshoot the
> > root cause?
> >
> > Best Regards,
> > Gaurav Abbi
> >
>

Re: increased response time for OffsetCommit requests

Posted by Apurva Mehta <ap...@confluent.io>.
How much is the increase? Is there any increase in throughput?

On Mon, Jul 31, 2017 at 8:04 AM, Gaurav Abbi <ab...@gmail.com> wrote:

> Hi All,
> We recently upgraded to Kafka 0.11.0.0 from 0.10.1.1.
> Since then we have been observing increased latencies especially
> OffsetCommit requests.
> Looking at the server side metrics, it seems the culprit is the Follower
> time.
>
> We are using following
> inter.broker.protocol.version: 0.11.0.0
> log.message.format.version: 0.9.0.1
>
> Are there some possible pointers that we can explore to troubleshoot the
> root cause?
>
> Best Regards,
> Gaurav Abbi
>