You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Richard Rodseth <rr...@gmail.com> on 2016/04/22 20:08:15 UTC

poll() semantics

Do I understand correctly that poll() will return a subset of the messages
in a topic each time it is called? So if I want to replay all messages, I
would seek to the beginning and call poll in a loop? Not easily knowing
when I was done, without a high watermark

https://issues.apache.org/jira/browse/KAFKA-2076

This is a pretty basic question, but I don't think it is explained in the
JavaDoc

http://kafka.apache.org/090/javadoc/index.html?org/apache/kafka/clients/consumer/KafkaConsumer.html

Thanks

Re: poll() semantics

Posted by Richard Rodseth <rr...@gmail.com>.
Thanks. Yes I get that it's bytes.
Good to know about the new setting.

On Sun, Apr 24, 2016 at 10:19 AM, Jens Rantil <je...@tink.se> wrote:

> Hi Richard,
>
> > which defaults to a very large large number, will affect the number of
> records returned by each call to poll()
>
> No, it will affect the total sum of the message sizes fetched. This is not
> the same as "number of messages". The upcoming release of 9.1 (not out yet)
> will contain a setting that allows you to set a cap on the maximum number
> of messages that poll() returns. See also
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-41%3A+KafkaConsumer+Max+Records
> .
>
> Cheers,
> Jens
>
> On Sat, Apr 23, 2016 at 2:20 AM Richard Rodseth <rr...@gmail.com>
> wrote:
>
> > To answer my own question (partially), I have learned that
> >
> > max.partition.fetch.bytes
> >
> > , which defaults to a very large large number, will affect the number of
> > records returned by each call to poll()
> >
> > I also learned that seekToBeginning is a partition-level thing, but
> >
> >      props.put("auto.offset.reset","earliest")
> > has the desired effect.
> >
> > On Fri, Apr 22, 2016 at 11:08 AM, Richard Rodseth <rr...@gmail.com>
> > wrote:
> >
> > > Do I understand correctly that poll() will return a subset of the
> > messages
> > > in a topic each time it is called? So if I want to replay all
> messages, I
> > > would seek to the beginning and call poll in a loop? Not easily knowing
> > > when I was done, without a high watermark
> > >
> > > https://issues.apache.org/jira/browse/KAFKA-2076
> > >
> > > This is a pretty basic question, but I don't think it is explained in
> the
> > > JavaDoc
> > >
> > >
> > >
> >
> http://kafka.apache.org/090/javadoc/index.html?org/apache/kafka/clients/consumer/KafkaConsumer.html
> > >
> > > Thanks
> > >
> >
> --
>
> Jens Rantil
> Backend Developer @ Tink
>
> Tink AB, Wallingatan 5, 111 60 Stockholm, Sweden
> For urgent matters you can reach me at +46-708-84 18 32.
>

Re: poll() semantics

Posted by Jens Rantil <je...@tink.se>.
Hi Richard,

> which defaults to a very large large number, will affect the number of
records returned by each call to poll()

No, it will affect the total sum of the message sizes fetched. This is not
the same as "number of messages". The upcoming release of 9.1 (not out yet)
will contain a setting that allows you to set a cap on the maximum number
of messages that poll() returns. See also
https://cwiki.apache.org/confluence/display/KAFKA/KIP-41%3A+KafkaConsumer+Max+Records
.

Cheers,
Jens

On Sat, Apr 23, 2016 at 2:20 AM Richard Rodseth <rr...@gmail.com> wrote:

> To answer my own question (partially), I have learned that
>
> max.partition.fetch.bytes
>
> , which defaults to a very large large number, will affect the number of
> records returned by each call to poll()
>
> I also learned that seekToBeginning is a partition-level thing, but
>
>      props.put("auto.offset.reset","earliest")
> has the desired effect.
>
> On Fri, Apr 22, 2016 at 11:08 AM, Richard Rodseth <rr...@gmail.com>
> wrote:
>
> > Do I understand correctly that poll() will return a subset of the
> messages
> > in a topic each time it is called? So if I want to replay all messages, I
> > would seek to the beginning and call poll in a loop? Not easily knowing
> > when I was done, without a high watermark
> >
> > https://issues.apache.org/jira/browse/KAFKA-2076
> >
> > This is a pretty basic question, but I don't think it is explained in the
> > JavaDoc
> >
> >
> >
> http://kafka.apache.org/090/javadoc/index.html?org/apache/kafka/clients/consumer/KafkaConsumer.html
> >
> > Thanks
> >
>
-- 

Jens Rantil
Backend Developer @ Tink

Tink AB, Wallingatan 5, 111 60 Stockholm, Sweden
For urgent matters you can reach me at +46-708-84 18 32.

Re: poll() semantics

Posted by Richard Rodseth <rr...@gmail.com>.
To answer my own question (partially), I have learned that

max.partition.fetch.bytes

, which defaults to a very large large number, will affect the number of
records returned by each call to poll()

I also learned that seekToBeginning is a partition-level thing, but

     props.put("auto.offset.reset","earliest")
has the desired effect.

On Fri, Apr 22, 2016 at 11:08 AM, Richard Rodseth <rr...@gmail.com>
wrote:

> Do I understand correctly that poll() will return a subset of the messages
> in a topic each time it is called? So if I want to replay all messages, I
> would seek to the beginning and call poll in a loop? Not easily knowing
> when I was done, without a high watermark
>
> https://issues.apache.org/jira/browse/KAFKA-2076
>
> This is a pretty basic question, but I don't think it is explained in the
> JavaDoc
>
>
> http://kafka.apache.org/090/javadoc/index.html?org/apache/kafka/clients/consumer/KafkaConsumer.html
>
> Thanks
>