You are viewing a plain text version of this content. The canonical link for it is here.

Posted to jira@kafka.apache.org by "Colin McCabe (Jira)" <ji...@apache.org> on 2021/03/15 22:37:00 UTC

[jira] [Commented] (KAFKA-10518) Consumer fetches could be inefficient when lags are unbalanced

    [ https://issues.apache.org/jira/browse/KAFKA-10518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17302075#comment-17302075 ] 

Colin McCabe commented on KAFKA-10518:
--------------------------------------

One simple workaround is to increase max.poll.records so that you get more records per fetch for the partition that has a high rate of new records.

I do wonder whether there's any useful purpose served by "explicitly exclud[ing] partitions for which the consumer received data in the previous round".  It seems a bit like an implementation hack based on how we do buffering in the consumer, but I could be missing something....

> Consumer fetches could be inefficient when lags are unbalanced
> --------------------------------------------------------------
>
>                 Key: KAFKA-10518
>                 URL: https://issues.apache.org/jira/browse/KAFKA-10518
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Dhruvil Shah
>            Priority: Major
>
> Consumer fetches are inefficient when lags are imbalanced across partitions, due to head of the line blocking and the behavior of blocking for `max.wait.ms` until data is available.
> When the consumer receives a fetch response, it prepares the next fetch request and sends it out. The caveat is that the subsequent fetch request would explicitly exclude partitions for which the consumer received data in the previous round. This is to allow the consumer application to drain the data for those partitions, until the consumer fetches the other partitions it is subscribed to.
> This behavior does not play out too well if the consumer is consuming when the lag is unbalanced, because it would receive data for the partitions it is lagging on, and then it would send a fetch request for partitions that do not have any data (or have little data). The latter will end up blocking for fetch.max.wait.ms on the broker before an empty response is sent back. This slows down the consumer’s overall consumption throughput since fetch.max.wait.ms is 500ms by default.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)