You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Franco Giacosa <fg...@gmail.com> on 2016/01/07 18:54:16 UTC

Re: consuming 0 records

Guozhang,

Thanks for the answer. just to be clear, there is no way to be sure that
I'm going to pull 1 record right?

Franco.

2015-12-31 20:18 GMT+01:00 Guozhang Wang <wa...@gmail.com>:

> Franco,
>
> I think this is a mis-documentation of "poll(0)":
> https://issues.apache.org/jira/browse/KAFKA-3044
>
> As for your case, I would suggest trying with poll(x) where x is reasonably
> large, it does not mean "always wait for that among of time", but "wait for
> at most approximately that among of time, and return as long as there are
> some data fetched".
>
> Guozhang
>
>
> On Wed, Dec 30, 2015 at 11:16 AM, Dana Powers <da...@gmail.com>
> wrote:
>
> > A few thoughts from a non-expert:
> >
> > connections are also processed asynchronously in the poll loop. If you
> are
> > not enabling any timeout, you may be seeing a few initial iterations
> spent
> > on setting up the channel connections. Also you probably need a few loop
> > iterations to get through an initial metadata request / response.
> >
> > also, if I recall, records should be returned in batches per
> > topic-partition; not one-by-one. So if/when records are ready, you would
> > get as many as were received via completed FetchRequests -- depends on
> > message size and fetch configs max.partition.fetch.bytes,
> fetch.min.bytes,
> > and fetch.max.wait.ms. So you shouldn't expect to poll 500x.
> >
> > I'd suggest using a small, but non-zero timeout when polling. 100ms is
> used
> > in the docs quite a bit.
> >
> > -Dana
> >
> > On Wed, Dec 30, 2015 at 10:03 AM, Franco Giacosa <fg...@gmail.com>
> > wrote:
> >
> > > Hi,
> > >
> > > I am running kafka 0.9.0 locally.
> > >
> > > I am having a particular situation in the following scenario.
> > >
> > > (1) 1 Producer inserts 500 records (300bytes each aprox) to 1 topic 0
> > > partition (or 1 as you prefer)
> > > (2) After the producer finished inserting the 500 records, 1 Consumer
> > reads
> > > in a loop from this topic with consumer.poll(0)
> > > and max.partition.fetch.bytes=500, sometimes that call brings records
> and
> > > something the loop has to go over a few times until it brings
> something.
> > > Can someone explain me why it doesn't fetch a record each time that it
> > > polls? can a poll operation affect another poll operation?
> > > why if I've inserted 500 records I have to poll more than 500 times?
> > >
> > > I have tried using poll(0), because in the documentation it says, "if
> 0,
> > > returns with any records that are available now".
> > >
> > > Thanks
> > >
> >
>
>
>
> --
> -- Guozhang
>

Re: consuming 0 records

Posted by Guozhang Wang <wa...@gmail.com>.
Franco,

There is a KIP discussion about adding a maxRecord config to the Java
consumer:

https://cwiki.apache.org/confluence/display/KAFKA/KIP-41%3A+KafkaConsumer+Max+Records

If that is adopted then you can set it to 1 in order to only pull 1 record
at a time.

Guozhang

On Thu, Jan 7, 2016 at 9:54 AM, Franco Giacosa <fg...@gmail.com> wrote:

> Guozhang,
>
> Thanks for the answer. just to be clear, there is no way to be sure that
> I'm going to pull 1 record right?
>
> Franco.
>
> 2015-12-31 20:18 GMT+01:00 Guozhang Wang <wa...@gmail.com>:
>
> > Franco,
> >
> > I think this is a mis-documentation of "poll(0)":
> > https://issues.apache.org/jira/browse/KAFKA-3044
> >
> > As for your case, I would suggest trying with poll(x) where x is
> reasonably
> > large, it does not mean "always wait for that among of time", but "wait
> for
> > at most approximately that among of time, and return as long as there are
> > some data fetched".
> >
> > Guozhang
> >
> >
> > On Wed, Dec 30, 2015 at 11:16 AM, Dana Powers <da...@gmail.com>
> > wrote:
> >
> > > A few thoughts from a non-expert:
> > >
> > > connections are also processed asynchronously in the poll loop. If you
> > are
> > > not enabling any timeout, you may be seeing a few initial iterations
> > spent
> > > on setting up the channel connections. Also you probably need a few
> loop
> > > iterations to get through an initial metadata request / response.
> > >
> > > also, if I recall, records should be returned in batches per
> > > topic-partition; not one-by-one. So if/when records are ready, you
> would
> > > get as many as were received via completed FetchRequests -- depends on
> > > message size and fetch configs max.partition.fetch.bytes,
> > fetch.min.bytes,
> > > and fetch.max.wait.ms. So you shouldn't expect to poll 500x.
> > >
> > > I'd suggest using a small, but non-zero timeout when polling. 100ms is
> > used
> > > in the docs quite a bit.
> > >
> > > -Dana
> > >
> > > On Wed, Dec 30, 2015 at 10:03 AM, Franco Giacosa <fg...@gmail.com>
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > I am running kafka 0.9.0 locally.
> > > >
> > > > I am having a particular situation in the following scenario.
> > > >
> > > > (1) 1 Producer inserts 500 records (300bytes each aprox) to 1 topic 0
> > > > partition (or 1 as you prefer)
> > > > (2) After the producer finished inserting the 500 records, 1 Consumer
> > > reads
> > > > in a loop from this topic with consumer.poll(0)
> > > > and max.partition.fetch.bytes=500, sometimes that call brings records
> > and
> > > > something the loop has to go over a few times until it brings
> > something.
> > > > Can someone explain me why it doesn't fetch a record each time that
> it
> > > > polls? can a poll operation affect another poll operation?
> > > > why if I've inserted 500 records I have to poll more than 500 times?
> > > >
> > > > I have tried using poll(0), because in the documentation it says, "if
> > 0,
> > > > returns with any records that are available now".
> > > >
> > > > Thanks
> > > >
> > >
> >
> >
> >
> > --
> > -- Guozhang
> >
>



-- 
-- Guozhang