You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@kafka.apache.org by frederic arno <fr...@gmail.com> on 2017/11/22 15:03:39 UTC

Producer request.timeout.ms not always working as advertised

Hi all,

I'm testing a setup where I have 3 zookeeper hosts and 3 kafka brokers
(version 1.0.0), using the kafka-producer-perf-test.sh script.

It seems that in certain circumstances, sending records is not retried
after a timeout. I'm not sure what is wrong...

From the documentation of the request.timeout.ms producer config, I
understand that the request should be retried after a timeout: "If the
response is not received before the timeout elapses the client will
resend the request if necessary or fail the request if retries are
exhausted".

However it seems that it is not (always) the case, when overloading
the brokers (or the network) using kafka-producer-perf-test.sh, I see
that kind of output:

org.apache.kafka.common.errors.TimeoutException: Expiring 184
record(s) for test-123: 40011 ms has passed since batch creation plus
linger time

I don't understand the above log, the duration printed correspond to
request.timeout.ms, why isn't it retried (retries is set to MAX_INT)?

I also see a couple of those, but it's not for the same partitions as
the messages above:

WARN [Producer clientId=producer-1] Got error produce response with
correlation id 3000 on topic-partition test-92, retrying (2147483646
attempts left). Error: REQUEST_TIMED_OUT
(org.apache.kafka.clients.producer.internals.Sender)

For reference, here's the command line used:

./kafka-producer-perf-test.sh --topic test --producer.config
producer.properties --payload-file ../LICENSE --num-records 1000000
--throughput 10000 --print-metrics


and the content of producer.properties (the purpose is maximum
reliability, inspired by
https://www.confluent.io/kafka-summit-2016-ops-when-it-absolutely-positively-has-to-be-there/):

bootstrap.servers=10.0.17.157:9092,10.0.17.205:9092,10.0.30.24:9092
compression.type = none
enable.idempotence = true
max.in.flight.requests.per.connection = 1
request.timeout.ms = 40000
acks = all
retries = 2147483647
retry.backoff.ms = 1000

a few of the corresponding metrics:
producer-metrics:record-error-total:{client-id=producer-1}
           : 2958.000
producer-metrics:record-retry-total:{client-id=producer-1}
           : 68.000
producer-metrics:record-send-total:{client-id=producer-1}
           : 997110.000
producer-node-metrics:request-total:{client-id=producer-1,
node-id=node--1}       : 89.000
producer-node-metrics:request-total:{client-id=producer-1,
node-id=node--2}       : 24.000
producer-node-metrics:request-total:{client-id=producer-1,
node-id=node--3}       : 24.000
producer-node-metrics:request-total:{client-id=producer-1,
node-id=node-1}        : 42859475.000
producer-node-metrics:request-total:{client-id=producer-1,
node-id=node-2}        : 50855563.000
producer-node-metrics:request-total:{client-id=producer-1,
node-id=node-3}        : 41940906.000
producer-topic-metrics:record-error-total:{client-id=producer-1,
topic=test}      : 2958.000
producer-topic-metrics:record-retry-total:{client-id=producer-1,
topic=test}      : 68.000
producer-topic-metrics:record-send-total:{client-id=producer-1,
topic=test}       : 997110.000

In the metrics above, I get the node-[1-3], but what are the node--[1-3]?

Thanks

Re: Producer request.timeout.ms not always working as advertised

Posted by frederic arno <fr...@gmail.com>.

KIP-91 is very interesting and answers my questions perfectly, I'm
eagerly waiting for 1.1.0!

thank you, Fred

On Wed, Nov 22, 2017 at 11:42 PM, Ismael Juma <is...@juma.me.uk> wrote:
> If you haven't please check KIP-91:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-91+Provide+Intuitive+User+Timeouts+in+The+Producer
>
> It explains how things currently work as well as an improvement (which
> should land in 1.1.0).
>
> Ismael
>
> On Wed, Nov 22, 2017 at 3:03 PM, frederic arno <fr...@gmail.com>
> wrote:
>
>> Hi all,
>>
>> I'm testing a setup where I have 3 zookeeper hosts and 3 kafka brokers
>> (version 1.0.0), using the kafka-producer-perf-test.sh script.
>>
>> It seems that in certain circumstances, sending records is not retried
>> after a timeout. I'm not sure what is wrong...
>>
>> From the documentation of the request.timeout.ms producer config, I
>> understand that the request should be retried after a timeout: "If the
>> response is not received before the timeout elapses the client will
>> resend the request if necessary or fail the request if retries are
>> exhausted".
>>
>> However it seems that it is not (always) the case, when overloading
>> the brokers (or the network) using kafka-producer-perf-test.sh, I see
>> that kind of output:
>>
>> org.apache.kafka.common.errors.TimeoutException: Expiring 184
>> record(s) for test-123: 40011 ms has passed since batch creation plus
>> linger time
>>
>> I don't understand the above log, the duration printed correspond to
>> request.timeout.ms, why isn't it retried (retries is set to MAX_INT)?
>>
>> I also see a couple of those, but it's not for the same partitions as
>> the messages above:
>>
>> WARN [Producer clientId=producer-1] Got error produce response with
>> correlation id 3000 on topic-partition test-92, retrying (2147483646
>> attempts left). Error: REQUEST_TIMED_OUT
>> (org.apache.kafka.clients.producer.internals.Sender)
>>
>> For reference, here's the command line used:
>>
>> ./kafka-producer-perf-test.sh --topic test --producer.config
>> producer.properties --payload-file ../LICENSE --num-records 1000000
>> --throughput 10000 --print-metrics
>>
>>
>> and the content of producer.properties (the purpose is maximum
>> reliability, inspired by
>> https://www.confluent.io/kafka-summit-2016-ops-when-it-
>> absolutely-positively-has-to-be-there/):
>>
>> bootstrap.servers=10.0.17.157:9092,10.0.17.205:9092,10.0.30.24:9092
>> compression.type = none
>> enable.idempotence = true
>> max.in.flight.requests.per.connection = 1
>> request.timeout.ms = 40000
>> acks = all
>> retries = 2147483647
>> retry.backoff.ms = 1000
>>
>> a few of the corresponding metrics:
>> producer-metrics:record-error-total:{client-id=producer-1}
>>            : 2958.000
>> producer-metrics:record-retry-total:{client-id=producer-1}
>>            : 68.000
>> producer-metrics:record-send-total:{client-id=producer-1}
>>            : 997110.000
>> producer-node-metrics:request-total:{client-id=producer-1,
>> node-id=node--1}       : 89.000
>> producer-node-metrics:request-total:{client-id=producer-1,
>> node-id=node--2}       : 24.000
>> producer-node-metrics:request-total:{client-id=producer-1,
>> node-id=node--3}       : 24.000
>> producer-node-metrics:request-total:{client-id=producer-1,
>> node-id=node-1}        : 42859475.000
>> producer-node-metrics:request-total:{client-id=producer-1,
>> node-id=node-2}        : 50855563.000
>> producer-node-metrics:request-total:{client-id=producer-1,
>> node-id=node-3}        : 41940906.000
>> producer-topic-metrics:record-error-total:{client-id=producer-1,
>> topic=test}      : 2958.000
>> producer-topic-metrics:record-retry-total:{client-id=producer-1,
>> topic=test}      : 68.000
>> producer-topic-metrics:record-send-total:{client-id=producer-1,
>> topic=test}       : 997110.000
>>
>> In the metrics above, I get the node-[1-3], but what are the node--[1-3]?
>>
>> Thanks
>>

Re: Producer request.timeout.ms not always working as advertised

Posted by Ismael Juma <is...@juma.me.uk>.

If you haven't please check KIP-91:

https://cwiki.apache.org/confluence/display/KAFKA/KIP-91+Provide+Intuitive+User+Timeouts+in+The+Producer

It explains how things currently work as well as an improvement (which
should land in 1.1.0).

Ismael

On Wed, Nov 22, 2017 at 3:03 PM, frederic arno <fr...@gmail.com>
wrote:

> Hi all,
>
> I'm testing a setup where I have 3 zookeeper hosts and 3 kafka brokers
> (version 1.0.0), using the kafka-producer-perf-test.sh script.
>
> It seems that in certain circumstances, sending records is not retried
> after a timeout. I'm not sure what is wrong...
>
> From the documentation of the request.timeout.ms producer config, I
> understand that the request should be retried after a timeout: "If the
> response is not received before the timeout elapses the client will
> resend the request if necessary or fail the request if retries are
> exhausted".
>
> However it seems that it is not (always) the case, when overloading
> the brokers (or the network) using kafka-producer-perf-test.sh, I see
> that kind of output:
>
> org.apache.kafka.common.errors.TimeoutException: Expiring 184
> record(s) for test-123: 40011 ms has passed since batch creation plus
> linger time
>
> I don't understand the above log, the duration printed correspond to
> request.timeout.ms, why isn't it retried (retries is set to MAX_INT)?
>
> I also see a couple of those, but it's not for the same partitions as
> the messages above:
>
> WARN [Producer clientId=producer-1] Got error produce response with
> correlation id 3000 on topic-partition test-92, retrying (2147483646
> attempts left). Error: REQUEST_TIMED_OUT
> (org.apache.kafka.clients.producer.internals.Sender)
>
> For reference, here's the command line used:
>
> ./kafka-producer-perf-test.sh --topic test --producer.config
> producer.properties --payload-file ../LICENSE --num-records 1000000
> --throughput 10000 --print-metrics
>
>
> and the content of producer.properties (the purpose is maximum
> reliability, inspired by
> https://www.confluent.io/kafka-summit-2016-ops-when-it-
> absolutely-positively-has-to-be-there/):
>
> bootstrap.servers=10.0.17.157:9092,10.0.17.205:9092,10.0.30.24:9092
> compression.type = none
> enable.idempotence = true
> max.in.flight.requests.per.connection = 1
> request.timeout.ms = 40000
> acks = all
> retries = 2147483647
> retry.backoff.ms = 1000
>
> a few of the corresponding metrics:
> producer-metrics:record-error-total:{client-id=producer-1}
>            : 2958.000
> producer-metrics:record-retry-total:{client-id=producer-1}
>            : 68.000
> producer-metrics:record-send-total:{client-id=producer-1}
>            : 997110.000
> producer-node-metrics:request-total:{client-id=producer-1,
> node-id=node--1}       : 89.000
> producer-node-metrics:request-total:{client-id=producer-1,
> node-id=node--2}       : 24.000
> producer-node-metrics:request-total:{client-id=producer-1,
> node-id=node--3}       : 24.000
> producer-node-metrics:request-total:{client-id=producer-1,
> node-id=node-1}        : 42859475.000
> producer-node-metrics:request-total:{client-id=producer-1,
> node-id=node-2}        : 50855563.000
> producer-node-metrics:request-total:{client-id=producer-1,
> node-id=node-3}        : 41940906.000
> producer-topic-metrics:record-error-total:{client-id=producer-1,
> topic=test}      : 2958.000
> producer-topic-metrics:record-retry-total:{client-id=producer-1,
> topic=test}      : 68.000
> producer-topic-metrics:record-send-total:{client-id=producer-1,
> topic=test}       : 997110.000
>
> In the metrics above, I get the node-[1-3], but what are the node--[1-3]?
>
> Thanks
>