You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@kafka.apache.org by Mads Tandrup <Ma...@schneider-electric.com> on 2018/05/04 11:19:06 UTC

What is the performance impact of setting max.poll.records=1

Hi

What is the performance impact of setting `max.poll.records=1` as opposed to the default of 500?

I have a Java application which process records one at a time. The processing time varies between messages, so we sometimes exceed the `max.poll.interval.ms`.
While I could increase `max.poll.interval.ms` it would prevent me from detecting a livelock in the application quickly.
There's no benefit of batching the records so I'm considering setting `max.poll.records=1`. We can define a sensible upper limit for the processing time of a single record.

I've tried to look at the code and it seems that it fetches up to ` fetch.max.bytes` and then keep it in-memory and returns records from the fetched data when `poll()` is called.

So what is the performance impact of a low `max.poll.records`?

Best regards,
Mads

Re: What is the performance impact of setting max.poll.records=1

Posted by "Matthias J. Sax" <ma...@confluent.io>.

`max.poll.records` only configures how many records are returned from
poll(). Internally, the consumer buffers a batch or records and only if
this batch is empty, if will do a new fetch request within poll().


-Matthias


On 5/10/18 10:46 PM, Mads Tandrup wrote:
> Hi
> 
> I forgot to metion that I have multiple partitions and multiple consumer processes.
> But we can't process the messages in the same partition in parallel since they might influence the processing of later records.
> 
> Does max.poll.records=1 always go to the remote server each time? What if I increase fetch.min.bytes to say the expected size of 10 records. What will then happen?
> 
> Best regards,
> Mads
> 
> D. 07/05/2018 06.36 skrev "R Krishna" <kr...@gmail.com>:
> 
>     You can always add more partitions/consumer threads each fetching a few
>     more records than 1 but manually commit asynchronously one at a time, not
>     the best but better than doing max.poll.records=1 which fetches one record
>     from remote server at a time.
>     
>     On Fri, May 4, 2018 at 4:19 AM, Mads Tandrup <
>     Mads.Tandrup@schneider-electric.com> wrote:
>     
>     > Hi
>     >
>     > What is the performance impact of setting `max.poll.records=1` as opposed
>     > to the default of 500?
>     >
>     > I have a Java application which process records one at a time. The
>     > processing time varies between messages, so we sometimes exceed the `
>     > max.poll.interval.ms`.
>     > While I could increase `max.poll.interval.ms` it would prevent me from
>     > detecting a livelock in the application quickly.
>     > There's no benefit of batching the records so I'm considering setting
>     > `max.poll.records=1`. We can define a sensible upper limit for the
>     > processing time of a single record.
>     >
>     > I've tried to look at the code and it seems that it fetches up to `
>     > fetch.max.bytes` and then keep it in-memory and returns records from the
>     > fetched data when `poll()` is called.
>     >
>     > So what is the performance impact of a low `max.poll.records`?
>     >
>     > Best regards,
>     > Mads
>     >
>     >
>     
>     
>     -- 
>     Radha Krishna, Proddaturi
>     253-234-5657
>     
>     
>     ______________________________________________________________________
>     This email has been scanned by the Symantec Email Security.cloud service.
>     ______________________________________________________________________
>

Re: What is the performance impact of setting max.poll.records=1

Posted by Mads Tandrup <Ma...@schneider-electric.com>.

Hi

I forgot to metion that I have multiple partitions and multiple consumer processes.
But we can't process the messages in the same partition in parallel since they might influence the processing of later records.

Does max.poll.records=1 always go to the remote server each time? What if I increase fetch.min.bytes to say the expected size of 10 records. What will then happen?

Best regards,
Mads

D. 07/05/2018 06.36 skrev "R Krishna" <kr...@gmail.com>:

    You can always add more partitions/consumer threads each fetching a few
    more records than 1 but manually commit asynchronously one at a time, not
    the best but better than doing max.poll.records=1 which fetches one record
    from remote server at a time.
    
    On Fri, May 4, 2018 at 4:19 AM, Mads Tandrup <
    Mads.Tandrup@schneider-electric.com> wrote:
    
    > Hi
    >
    > What is the performance impact of setting `max.poll.records=1` as opposed
    > to the default of 500?
    >
    > I have a Java application which process records one at a time. The
    > processing time varies between messages, so we sometimes exceed the `
    > max.poll.interval.ms`.
    > While I could increase `max.poll.interval.ms` it would prevent me from
    > detecting a livelock in the application quickly.
    > There's no benefit of batching the records so I'm considering setting
    > `max.poll.records=1`. We can define a sensible upper limit for the
    > processing time of a single record.
    >
    > I've tried to look at the code and it seems that it fetches up to `
    > fetch.max.bytes` and then keep it in-memory and returns records from the
    > fetched data when `poll()` is called.
    >
    > So what is the performance impact of a low `max.poll.records`?
    >
    > Best regards,
    > Mads
    >
    >
    
    
    -- 
    Radha Krishna, Proddaturi
    253-234-5657
    
    
    ______________________________________________________________________
    This email has been scanned by the Symantec Email Security.cloud service.
    ______________________________________________________________________

Re: What is the performance impact of setting max.poll.records=1

Posted by R Krishna <kr...@gmail.com>.

You can always add more partitions/consumer threads each fetching a few
more records than 1 but manually commit asynchronously one at a time, not
the best but better than doing max.poll.records=1 which fetches one record
from remote server at a time.

On Fri, May 4, 2018 at 4:19 AM, Mads Tandrup <
Mads.Tandrup@schneider-electric.com> wrote:

> Hi
>
> What is the performance impact of setting `max.poll.records=1` as opposed
> to the default of 500?
>
> I have a Java application which process records one at a time. The
> processing time varies between messages, so we sometimes exceed the `
> max.poll.interval.ms`.
> While I could increase `max.poll.interval.ms` it would prevent me from
> detecting a livelock in the application quickly.
> There's no benefit of batching the records so I'm considering setting
> `max.poll.records=1`. We can define a sensible upper limit for the
> processing time of a single record.
>
> I've tried to look at the code and it seems that it fetches up to `
> fetch.max.bytes` and then keep it in-memory and returns records from the
> fetched data when `poll()` is called.
>
> So what is the performance impact of a low `max.poll.records`?
>
> Best regards,
> Mads
>
>


-- 
Radha Krishna, Proddaturi
253-234-5657