You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by GitBox <gi...@apache.org> on 2021/12/14 20:43:19 UTC

[GitHub] [kafka] guozhangwang commented on pull request #11600: KAFKA-12648: handle MissingSourceTopicException for named topologies

guozhangwang commented on pull request #11600:
URL: https://github.com/apache/kafka/pull/11600#issuecomment-993971910


   @ableegoldman I checked the source code trying to reason what's the difference between 1) subscribe for the first time, and then poll to potentially rebalance, 2) re-subscribe, and then poll to potentially rebalance.
   
   For 1) we have the logic to enforce metadata being fresh until we call `ensureActiveGroupensureActiveGroup`, so that's cleared as long as all members `subscribe` with the same topics.
   
   For 2) the same logic is there, so the member kicking start the rebalance with the new join-group should always have the updated metadata, it's just that the leader who's actually triggering assign may not get the `subscribe` call yet --- since if the leader has triggered `subscribe`, it will set the `needPartialUpdate` flag in metadata and upon being requested to re-join, it would still wait until metadata to be refreshed.
   
   So that case in 2) would be similar to 1) where you have two members starting up, one that called `subscribe(A,B)` while the other one called `subscribe(A)` but the second one is picked as the leader. So in Streams' scenario, it's similar to two instances with different topologies but configured with the same `group id` --- in the past we treat it as fatal since instances cannot change their topology after starting up, but now they can, and we do not know if this is just temporary that soon the instances would change their topologies to match, or never --- so I think we should now treat it as transient. But the question is then, do we still need part1 of the PR if we do part2 to not treat it as final? If we just expose the `MissingSourceTopicException` to the handler and explain that "this may be transient, but we do not know" but now killing the thread, would that be sufficient?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org