You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "John Roesler (Jira)" <ji...@apache.org> on 2021/03/26 02:58:00 UTC
[jira] [Commented] (KAFKA-12360) Improve documentation of max.task.idle.ms (kafka-streams)

    [ https://issues.apache.org/jira/browse/KAFKA-12360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17309113#comment-17309113 ] 

John Roesler commented on KAFKA-12360:
--------------------------------------

Hi [~nicodds] ,

I can see your line of reasoning, but I think there must be something else going on there.

When a task is "idling", it does not block the poll loop. Rather, in each iteration of the poll loop, the task pseudocode is like this:

 
{code:java}
 checks if it has records buffered from both inputs
  if so, carry on processing
  if not, check if the idle timeout has been exceeded
    if so, carry on processing
    if not, loop around again and maybe call poll()
{code}
{{}}

Therefore, I don't think task idling can make you miss your poll interval. My guess is that when you set the poll interval lower, it happened to be smaller than the amount of time it takes to complete one loop of processing each task. In that case, the poll would timeout, causing a rebalance.

In fact, my typical advice is to make sure that the task idle time is _larger_ than the poll interval. As Matthias mentioned in the PR, task idling is pointless unless we actually call poll() again at least once before the timeout. In other words, I think your reasoning was correct, but some other factor came into play and caused the rebalances.

FYI, it doesn't help you right now, but I have just completed this feature, to be released in Kafka 3.0: [https://cwiki.apache.org/confluence/display/KAFKA/KIP-695%3A+Further+Improve+Kafka+Streams+Timestamp+Synchronization]

KIP-695 will make it so that you should get the desired join behavior by default, without having to mess with the task idling timeout at all. But it's not coming until 3.0 is released. Until then, maybe you can try returning the poll interval to the default and instead increasing the task idle time to be larger than the poll interval.

I hope this helps!
 -John

> Improve documentation of max.task.idle.ms (kafka-streams)
> ---------------------------------------------------------
>
>                 Key: KAFKA-12360
>                 URL: https://issues.apache.org/jira/browse/KAFKA-12360
>             Project: Kafka
>          Issue Type: Improvement
>          Components: docs, streams
>            Reporter: Domenico Delle Side
>            Priority: Minor
>              Labels: beginner, newbie, trivial
>
> _max.task.idle.ms_ is an handy way to pause processing in a *_kafka-streams_* application. This is very useful when you need to join two topics that are out of sync, i.e when data in a topic may be produced _before_ you receive join information in the other topic.
> In the documentation, however, it is not specified that the value of _max.task.idle.ms_ *must* be lower than _max.poll.intervall.ms_, otherwise you'll incur into an endless rebalancing problem.
> I think it is better to clearly state this in the documentation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)