You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by GitBox <gi...@apache.org> on 2021/02/25 08:37:11 UTC

[GitHub] [kafka] nicodds opened a new pull request #10207: Fixing documentation source for issue KAFKA-12360

nicodds opened a new pull request #10207:
URL: https://github.com/apache/kafka/pull/10207


   This PR fixes de documentation issue in https://issues.apache.org/jira/browse/KAFKA-12360
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] mjsax commented on pull request #10207: Fixing documentation source for issue KAFKA-12360

Posted by GitBox <gi...@apache.org>.
mjsax commented on pull request #10207:
URL: https://github.com/apache/kafka/pull/10207#issuecomment-789936874


   @nicodds -- Thinking about this again, I am actually wondering why the value must be smaller then `max.poll.interval.ms` ? If a tasks idles, we should will call `poll()` frequently so both configs should actually be independent?
   
   On the Jira ticket you claim:
   
   > otherwise you'll incur into an endless rebalancing problem
   
   Did you observe this in practice? Wondering, as it seems it should actually not be a problem. Can you clarify?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] nicodds commented on pull request #10207: Fixing documentation source for issue KAFKA-12360

Posted by GitBox <gi...@apache.org>.
nicodds commented on pull request #10207:
URL: https://github.com/apache/kafka/pull/10207#issuecomment-790406227


   > Did you observe this in practice? Wondering, as it seems it should actually not be a problem. Can you clarify?
   
   Exactly. I was building a kafka-streams application in wich human-unreadable messages in a topic (A) are joined against the data into another topic (B). Data in topic B may arrive later than in topic A. Consequently, I personalized one of the examples in which `max.task.idle.ms` is used for that purpose. Incidentally, I changed the value of `max.poll.interval.ms` to a short value, shorter than `max.task.idle.ms`, and I started experiencing frequent rebalancing.
   
   I suspect that the idle prevent the tasks from giving a sign of their presence, thus they are disconnected from the consumer group.
   
   Let me know if you need any further thing.
   
   All the best,
   Nico


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] nicodds commented on pull request #10207: Fixing documentation source for issue KAFKA-12360

Posted by GitBox <gi...@apache.org>.
nicodds commented on pull request #10207:
URL: https://github.com/apache/kafka/pull/10207#issuecomment-789623547


   ehy @mjsax, sorry for the delay! I updated the PR


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] vvcephei commented on pull request #10207: Fixing documentation source for issue KAFKA-12360

Posted by GitBox <gi...@apache.org>.
vvcephei commented on pull request #10207:
URL: https://github.com/apache/kafka/pull/10207#issuecomment-807898796


   Hi @nicodds ,
   
   I can see your line of reasoning, but I think there must be something else going on there.
   
   When a task is "idling", it does not block the poll loop. Rather, in each iteration of the poll loop, the task pseudocode is like this:
   
   ```
   checks if it has records buffered from both inputs
     if so, carry on processing
     if not, check if the idle timeout has been exceeded
       if so, carry on processing
       if not, loop around again and maybe call poll()
   ```
   
   Therefore, I don't think task idling can make you miss your poll interval. My guess is that when you set the poll interval lower, it happened to be smaller than the amount of time it takes to complete one loop of processing each task. In that case, the poll would timeout, causing a rebalance.
   
   In fact, my typical advice for cases like yours is the opposite of what this PR says: to make sure that the task idle time is _larger_ than the poll interval. As Matthias mentioned, task idling is pointless unless we actually call poll() again at least once before the timeout. In other words, I think your reasoning was correct, but some other factor came into play and caused the rebalances.
   
   FYI, it doesn't help you right now, but I have just completed this feature, to be released in Kafka 3.0: https://cwiki.apache.org/confluence/display/KAFKA/KIP-695%3A+Further+Improve+Kafka+Streams+Timestamp+Synchronization
   
   KIP-695 will make it so that you should get the desired join behavior by default, without having to mess with the task idling timeout at all. But it's not coming until 3.0 is released. Until then, maybe you can try returning the poll interval to the default and instead increasing the task idle time to be larger than the poll interval.
   
   I hope this helps!
   -John


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] mjsax commented on pull request #10207: Fixing documentation source for issue KAFKA-12360

Posted by GitBox <gi...@apache.org>.
mjsax commented on pull request #10207:
URL: https://github.com/apache/kafka/pull/10207#issuecomment-887104113


   Closing this PR due to inactivity. @nicodds Feel free to re-open if you see fit.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] mjsax closed pull request #10207: Fixing documentation source for issue KAFKA-12360

Posted by GitBox <gi...@apache.org>.
mjsax closed pull request #10207:
URL: https://github.com/apache/kafka/pull/10207


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] mjsax commented on pull request #10207: Fixing documentation source for issue KAFKA-12360

Posted by GitBox <gi...@apache.org>.
mjsax commented on pull request #10207:
URL: https://github.com/apache/kafka/pull/10207#issuecomment-786280723


   Thanks @nicodds!
   
   This PR should also update https://github.com/apache/kafka/blob/trunk/docs/streams/developer-guide/config-streams.html#L245-L246


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org