You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@kafka.apache.org by Luke Steensen <lu...@braintreepayments.com> on 2015/11/10 21:15:35 UTC

request.timeout.ms not working as expected

Hello,

We've been testing recent versions of trunk and are seeing surprising
behavior when trying to use the new request timeout functionality. For
example, at revision ae5a5d7:

# in separate terminals
$ ./bin/zookeeper-server-start.sh config/zookeeper.properties
$ ./bin/kafka-server-start.sh config/server.properties

# set request timeout
$ cat producer.properties
request.timeout.ms=1000

# run the verifiable producer, for example
$ ./bin/kafka-verifiable-producer.sh --broker-list localhost:9092 --topic
testing --throughput 5 --producer.config producer.properties

If you then kill the kafka server process, you will see the producer hang
indefinitely. This is a very simple case, but the behavior is surprising.
We have also found it easy to reproduce this behavior in more realistic
environments with multiple brokers, custom producers, etc. The end result
is that we're not sure how to safely decommission a broker without
potentially leaving a producer with a permanently stuck request.

Thanks,
Luke Steensen

Re: request.timeout.ms not working as expected

Posted by Mayuresh Gharat <gh...@gmail.com>.

Are you seeing errors for  metadata update?

If yes, I think I know why this might be happening.

Thanks,

Mayuresh

On Tue, Nov 10, 2015 at 6:35 PM, Mayuresh Gharat <gharatmayuresh15@gmail.com
> wrote:

> How many brokers are there in your test cluster?
>
>
> Thanks,
>
> Mayuresh
>
> On Tue, Nov 10, 2015 at 5:53 PM, Jason Gustafson <ja...@confluent.io>
> wrote:
>
>> Hey Luke,
>>
>> I agree the null check seems questionable. I went ahead and created
>> https://issues.apache.org/jira/browse/KAFKA-2805. At least we should
>> have a
>> comment clarifying why the check is correct.
>>
>> -Jason
>>
>> On Tue, Nov 10, 2015 at 2:15 PM, Luke Steensen <
>> luke.steensen@braintreepayments.com> wrote:
>>
>> > After some more investigation, I've been able to get the expected
>> behavior
>> > by removing the null check here:
>> >
>> >
>> https://github.com/apache/kafka/blob/ae5a5d7c08bb634576a414f6f2864c5b8a7e58a3/clients/src/main/java/org/apache/kafka/clients/producer/internals/RecordAccumulator.java#L220
>> >
>> > Hopefully someone more familiar with the code can comment, but that
>> > statement does appear to be preventing the correct behavior.
>> >
>> > Thanks,
>> > Luke
>> >
>> >
>> > On Tue, Nov 10, 2015 at 2:15 PM, Luke Steensen <
>> > luke.steensen@braintreepayments.com> wrote:
>> >
>> > > Hello,
>> > >
>> > > We've been testing recent versions of trunk and are seeing surprising
>> > > behavior when trying to use the new request timeout functionality. For
>> > > example, at revision ae5a5d7:
>> > >
>> > > # in separate terminals
>> > > $ ./bin/zookeeper-server-start.sh config/zookeeper.properties
>> > > $ ./bin/kafka-server-start.sh config/server.properties
>> > >
>> > > # set request timeout
>> > > $ cat producer.properties
>> > > request.timeout.ms=1000
>> > >
>> > > # run the verifiable producer, for example
>> > > $ ./bin/kafka-verifiable-producer.sh --broker-list localhost:9092
>> --topic
>> > > testing --throughput 5 --producer.config producer.properties
>> > >
>> > > If you then kill the kafka server process, you will see the producer
>> hang
>> > > indefinitely. This is a very simple case, but the behavior is
>> surprising.
>> > > We have also found it easy to reproduce this behavior in more
>> realistic
>> > > environments with multiple brokers, custom producers, etc. The end
>> result
>> > > is that we're not sure how to safely decommission a broker without
>> > > potentially leaving a producer with a permanently stuck request.
>> > >
>> > > Thanks,
>> > > Luke Steensen
>> > >
>> > >
>> >
>>
>
>
>
> --
> -Regards,
> Mayuresh R. Gharat
> (862) 250-7125
>



-- 
-Regards,
Mayuresh R. Gharat
(862) 250-7125

Re: request.timeout.ms not working as expected

Posted by Mayuresh Gharat <gh...@gmail.com>.

How many brokers are there in your test cluster?


Thanks,

Mayuresh

On Tue, Nov 10, 2015 at 5:53 PM, Jason Gustafson <ja...@confluent.io> wrote:

> Hey Luke,
>
> I agree the null check seems questionable. I went ahead and created
> https://issues.apache.org/jira/browse/KAFKA-2805. At least we should have
> a
> comment clarifying why the check is correct.
>
> -Jason
>
> On Tue, Nov 10, 2015 at 2:15 PM, Luke Steensen <
> luke.steensen@braintreepayments.com> wrote:
>
> > After some more investigation, I've been able to get the expected
> behavior
> > by removing the null check here:
> >
> >
> https://github.com/apache/kafka/blob/ae5a5d7c08bb634576a414f6f2864c5b8a7e58a3/clients/src/main/java/org/apache/kafka/clients/producer/internals/RecordAccumulator.java#L220
> >
> > Hopefully someone more familiar with the code can comment, but that
> > statement does appear to be preventing the correct behavior.
> >
> > Thanks,
> > Luke
> >
> >
> > On Tue, Nov 10, 2015 at 2:15 PM, Luke Steensen <
> > luke.steensen@braintreepayments.com> wrote:
> >
> > > Hello,
> > >
> > > We've been testing recent versions of trunk and are seeing surprising
> > > behavior when trying to use the new request timeout functionality. For
> > > example, at revision ae5a5d7:
> > >
> > > # in separate terminals
> > > $ ./bin/zookeeper-server-start.sh config/zookeeper.properties
> > > $ ./bin/kafka-server-start.sh config/server.properties
> > >
> > > # set request timeout
> > > $ cat producer.properties
> > > request.timeout.ms=1000
> > >
> > > # run the verifiable producer, for example
> > > $ ./bin/kafka-verifiable-producer.sh --broker-list localhost:9092
> --topic
> > > testing --throughput 5 --producer.config producer.properties
> > >
> > > If you then kill the kafka server process, you will see the producer
> hang
> > > indefinitely. This is a very simple case, but the behavior is
> surprising.
> > > We have also found it easy to reproduce this behavior in more realistic
> > > environments with multiple brokers, custom producers, etc. The end
> result
> > > is that we're not sure how to safely decommission a broker without
> > > potentially leaving a producer with a permanently stuck request.
> > >
> > > Thanks,
> > > Luke Steensen
> > >
> > >
> >
>



-- 
-Regards,
Mayuresh R. Gharat
(862) 250-7125

Re: request.timeout.ms not working as expected

Posted by Jason Gustafson <ja...@confluent.io>.

Hey Luke,

I agree the null check seems questionable. I went ahead and created
https://issues.apache.org/jira/browse/KAFKA-2805. At least we should have a
comment clarifying why the check is correct.

-Jason

On Tue, Nov 10, 2015 at 2:15 PM, Luke Steensen <
luke.steensen@braintreepayments.com> wrote:

> After some more investigation, I've been able to get the expected behavior
> by removing the null check here:
>
> https://github.com/apache/kafka/blob/ae5a5d7c08bb634576a414f6f2864c5b8a7e58a3/clients/src/main/java/org/apache/kafka/clients/producer/internals/RecordAccumulator.java#L220
>
> Hopefully someone more familiar with the code can comment, but that
> statement does appear to be preventing the correct behavior.
>
> Thanks,
> Luke
>
>
> On Tue, Nov 10, 2015 at 2:15 PM, Luke Steensen <
> luke.steensen@braintreepayments.com> wrote:
>
> > Hello,
> >
> > We've been testing recent versions of trunk and are seeing surprising
> > behavior when trying to use the new request timeout functionality. For
> > example, at revision ae5a5d7:
> >
> > # in separate terminals
> > $ ./bin/zookeeper-server-start.sh config/zookeeper.properties
> > $ ./bin/kafka-server-start.sh config/server.properties
> >
> > # set request timeout
> > $ cat producer.properties
> > request.timeout.ms=1000
> >
> > # run the verifiable producer, for example
> > $ ./bin/kafka-verifiable-producer.sh --broker-list localhost:9092 --topic
> > testing --throughput 5 --producer.config producer.properties
> >
> > If you then kill the kafka server process, you will see the producer hang
> > indefinitely. This is a very simple case, but the behavior is surprising.
> > We have also found it easy to reproduce this behavior in more realistic
> > environments with multiple brokers, custom producers, etc. The end result
> > is that we're not sure how to safely decommission a broker without
> > potentially leaving a producer with a permanently stuck request.
> >
> > Thanks,
> > Luke Steensen
> >
> >
>

Re: request.timeout.ms not working as expected

Posted by Luke Steensen <lu...@braintreepayments.com>.

After some more investigation, I've been able to get the expected behavior
by removing the null check here:
https://github.com/apache/kafka/blob/ae5a5d7c08bb634576a414f6f2864c5b8a7e58a3/clients/src/main/java/org/apache/kafka/clients/producer/internals/RecordAccumulator.java#L220

Hopefully someone more familiar with the code can comment, but that
statement does appear to be preventing the correct behavior.

Thanks,
Luke


On Tue, Nov 10, 2015 at 2:15 PM, Luke Steensen <
luke.steensen@braintreepayments.com> wrote:

> Hello,
>
> We've been testing recent versions of trunk and are seeing surprising
> behavior when trying to use the new request timeout functionality. For
> example, at revision ae5a5d7:
>
> # in separate terminals
> $ ./bin/zookeeper-server-start.sh config/zookeeper.properties
> $ ./bin/kafka-server-start.sh config/server.properties
>
> # set request timeout
> $ cat producer.properties
> request.timeout.ms=1000
>
> # run the verifiable producer, for example
> $ ./bin/kafka-verifiable-producer.sh --broker-list localhost:9092 --topic
> testing --throughput 5 --producer.config producer.properties
>
> If you then kill the kafka server process, you will see the producer hang
> indefinitely. This is a very simple case, but the behavior is surprising.
> We have also found it easy to reproduce this behavior in more realistic
> environments with multiple brokers, custom producers, etc. The end result
> is that we're not sure how to safely decommission a broker without
> potentially leaving a producer with a permanently stuck request.
>
> Thanks,
> Luke Steensen
>
>