You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@kafka.apache.org by "Ewen Cheslack-Postava (JIRA)" <ji...@apache.org> on 2016/11/27 06:03:58 UTC

[jira] [Commented] (KAFKA-4007) Improve fetch pipelining for low values of max.poll.records

    [ https://issues.apache.org/jira/browse/KAFKA-4007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15699134#comment-15699134 ] 

Ewen Cheslack-Postava commented on KAFKA-4007:
----------------------------------------------

[~enothereska] prefetching is based on the fetch requests, not any setting of max.poll.records. A new fetch request is only sent if the previous data is exhausted. If a user sets max.poll.records = 1, then a new request only gets sent when the data from the last request is completely exhausted. Since processing a single record is probably very fast, this isn't efficient -- we'd probably prefer to fetch data earlier since the network roundtrip (especially given the fetch.min.bytes means you could easily spend some time waiting on the broker/producers) may be relatively expensive.

The idea here is to send another fetch but delay processing until the previous fetch response has been fully processed. This pipelines data such that we could potentially have 2x the response data queued, but doesn't add any more overhead, still gives a chance for pipelining future data (given the extra buffer on the second batches of results), and doesn't defer fetching more data until the last minute (which max.poll.records would otherwise allow since it would allow for a single record to block requesting additional to satisfy subsequent poll() requests).

> Improve fetch pipelining for low values of max.poll.records
> -----------------------------------------------------------
>
>                 Key: KAFKA-4007
>                 URL: https://issues.apache.org/jira/browse/KAFKA-4007
>             Project: Kafka
>          Issue Type: Improvement
>          Components: consumer
>            Reporter: Jason Gustafson
>            Assignee: Mickael Maison
>
> Currently the consumer will only send a prefetch for a partition after all the records from the previous fetch have been consumed. This can lead to suboptimal pipelining when max.poll.records is set very low since the processing latency for a small set of records may be small compared to the latency of a fetch. An improvement suggested by [~junrao] is to send the fetch anyway even if we have unprocessed data buffered, but delay reading it from the socket until that data has been consumed. Potentially the consumer can delay reading _any_ pending fetch until it is ready to be returned to the user, which may help control memory better. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)