You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Priya Matpadi <pr...@ecofactor.com> on 2013/10/25 08:14:27 UTC

backoff.increment.ms support in kafka 0.8

We are looking at using kafka 0.8-beta1 and high level consumer.

kafka 0.7 consumer supported backoff.increment.ms to avoid repeatedly
polling a broker node which has no new data. It appears that this property
is no longer supported in 0.8. What is the reason?

Instead there is fetch.wait.max.ms which is the maximum amount of time the
server will block before answering the fetch request if there isn't
sufficient data to immediately satisfy fetch.min.bytes

We have different use cases where different producers produce messages
after regular intervals, for e.g. every minute, every 20 minutes, or once
daily or once weekly. But once messages are produced they need to be
consumed and processed asap.

In order to support these use cases, and to avoid frequent polling it feels
like we need to have very large value for fetch.wait.max.ms for the daily
and weekly topic consumers. I am looking for best practice tip here.

Will this keep the connection open between consumer connector and broker
for fetch.wait.max.ms duration? How will this affect other consumers on the
same machine which expect to consumer messages on a per minute basis?

Secondly, from other discussions I have read that it's best to keep
consumer.timeout.ms=-1 for high level consumer. I was wondering in which
situation is it beneficial to handle ConsumerTimeoutException?
Thanks,
Priya
Secondly,

Re: backoff.increment.ms support in kafka 0.8

Posted by Joel Koshy <jj...@gmail.com>.
>
> kafka 0.7 consumer supported backoff.increment.ms to avoid repeatedly
> polling a broker node which has no new data. It appears that this property
> is no longer supported in 0.8. What is the reason?

Kafka 0.7 did not support any polling policy which made fetching
rather inefficient with a fixed backoff period - e.g., you would not
be able to say block until data is available in 0.7. 0.8 as you have
seen supports long poll.

> Instead there is fetch.wait.max.ms which is the maximum amount of time the
> server will block before answering the fetch request if there isn't
> sufficient data to immediately satisfy fetch.min.bytes

yes - and after fetch.wait.max.ms you will get back whatever data is
available (< fetch.min.bytes).

>
> We have different use cases where different producers produce messages
> after regular intervals, for e.g. every minute, every 20 minutes, or once
> daily or once weekly. But once messages are produced they need to be
> consumed and processed asap.

The high level consumer can still provide this functionality. i.e., if
the fetch request expires it will issue another fetch request.

>
> In order to support these use cases, and to avoid frequent polling it feels
> like we need to have very large value for fetch.wait.max.ms for the daily
> and weekly topic consumers. I am looking for best practice tip here.
>
> Will this keep the connection open between consumer connector and broker
> for fetch.wait.max.ms duration? How will this affect other consumers on the
> same machine which expect to consumer messages on a per minute basis?

The defaults should be sufficient - although you can increase wait.max
to reduce the request rate on the broker-side for low-volume topics.
The connection will be kept open unless there is an error (e.g.,
leadership of a partition moves - in which case topic metadata will be
refreshed and new connections are set up only if required.)

>
> Secondly, from other discussions I have read that it's best to keep
> consumer.timeout.ms=-1 for high level consumer. I was wondering in which
> situation is it beneficial to handle ConsumerTimeoutException?

This depends pretty much on the use case. For the typical use case of
continuous streaming in which you are always expecting data you would
use  -1.

Thanks,

Joel