You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "Guozhang Wang (JIRA)" <ji...@apache.org> on 2019/03/26 22:46:00 UTC
[jira] [Commented] (KAFKA-6399) Consider reducing "max.poll.interval.ms" default for Kafka Streams

    [ https://issues.apache.org/jira/browse/KAFKA-6399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16802245#comment-16802245 ] 

Guozhang Wang commented on KAFKA-6399:
--------------------------------------

Reviving on this thread. Setting this config to MAX_VALUE does has some side-effects, e.g. when an instance is rebooted too quickly such that it has not be kicked out of the group by session.timeout, then this group's old member id will still be in the group and rebalance.timeout is set to MAX_VALUE which means that this rebalance will never complete as the coordinator will shut-off heart beating during the prepare-rebalance phase and wait for this old member to re-join forever.

So I think we should reduce it from MAX_VALUE for sure, and the question is to what default value. Personally I think the value should be biased towards a good OOTB (i.e. people do not override this value) experience and hence I'm preferring a larger default value like 5min, such that if processing / restoring a task spikes up we will not be kicked out of the group, whereas if the above scenario happened we are not blocked for more than 5 minutes (note that default request timeout is 30 seconds, so the members may be re-join 10 times, but due to KIP-394 it will not explode coordinator's metadata memory any more). 

> Consider reducing "max.poll.interval.ms" default for Kafka Streams
> ------------------------------------------------------------------
>
>                 Key: KAFKA-6399
>                 URL: https://issues.apache.org/jira/browse/KAFKA-6399
>             Project: Kafka
>          Issue Type: Improvement
>          Components: streams
>    Affects Versions: 1.0.0
>            Reporter: Matthias J. Sax
>            Assignee: John Roesler
>            Priority: Minor
>
> In Kafka {{0.10.2.1}} we change the default value of {{max.poll.intervall.ms}} for Kafka Streams to {{Integer.MAX_VALUE}}. The reason was that long state restore phases during rebalance could yield "rebalance storms" as consumers drop out of a consumer group even if they are healthy as they didn't call {{poll()}} during state restore phase.
> In version {{0.11}} and {{1.0}} the state restore logic was improved a lot and thus, now Kafka Streams does call {{poll()}} even during restore phase. Therefore, we might consider setting a smaller timeout for {{max.poll.intervall.ms}} to detect bad behaving Kafka Streams applications (ie, targeting user code) that don't make progress any more during regular operations.
> The open question would be, what a good default might be. Maybe the actual consumer default of 30 seconds might be sufficient. During one {{poll()}} roundtrip, we would only call {{restoreConsumer.poll()}} once and restore a single batch of records. This should take way less time than 30 seconds.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)