You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Zahari Dichev (JIRA)" <ji...@apache.org> on 2018/10/21 18:45:00 UTC

[jira] [Created] (KAFKA-7526) Allow for not throwing away prefetched data of paused partitions

Zahari Dichev created KAFKA-7526:
------------------------------------

             Summary: Allow for not throwing away prefetched data of paused partitions
                 Key: KAFKA-7526
                 URL: https://issues.apache.org/jira/browse/KAFKA-7526
             Project: Kafka
          Issue Type: Improvement
          Components: consumer
            Reporter: Zahari Dichev


Kafka consumer pipelines the fetching of data in order to maximise performance. Whenever {{poll(Duration)/poll(long)}} is called before any results is returned, another fetch is issued. Albeit benefitting performance, in some circumstances when combined with the use of the {{pause/resume API}}, this optimisation can result in transferring quite a bit of duplicate data over the wire. The reason for this to happen is that whenever {{poll}} is called any prefetched data is thrown away in case the topic-partition is paused. To illustrate the effect with a simple example, imagine that a single {{KafkaConsumer}} instance is assigned two topic partitions {{TP1}} and {{TP2}}. Since the client interested in {{TP1}} cannot handle records as fast than the one in {{TP2}}, we resort to pausing {{TP1}} whenever we are not interested in receiving records for it. This results in the following behavior:
 # {{TP1}} is resumed and poll is called on it, where poll returns some data
 # The consumer issues a fetch request in order to pre-fetch the next batch of records for {{TP1}}
 # {{TP2}} is resumed and {{TP1}} paused (as the consumer of {{TP1}} is not ready for more records)
 # All prefetched records for {{TP1}} are now thrown away.
 # This cycle repeats indefinitely

This KIP proposes an improvement that allows us to control whether we want to instead of throwing away the prefetched data, simply return it along with the rest of the records coming from partitions that are not in paused state. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)