You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by GitBox <gi...@apache.org> on 2020/04/24 17:09:42 UTC

[GitHub] [kafka] vvcephei commented on a change in pull request #8504: KAFKA-9298: reuse mapped stream error in joins

vvcephei commented on a change in pull request #8504:
URL: https://github.com/apache/kafka/pull/8504#discussion_r414731001



##########
File path: streams/src/main/java/org/apache/kafka/streams/kstream/internals/KStreamImpl.java
##########
@@ -989,16 +994,18 @@ private void to(final TopicNameExtractor<K, V> topicExtractor,
             null,
             optimizableRepartitionNodeBuilder);
 
-        final OptimizableRepartitionNode<K, V> optimizableRepartitionNode = optimizableRepartitionNodeBuilder.build();
-        builder.addGraphNode(streamsGraphNode, optimizableRepartitionNode);
+        if (repartitionNode == null || !name.equals(repartitionName)) {

Review comment:
       Since you made the mistake of asking my opinion, here it is :) :
   
   > bumping the index
   It's true that users can't currently reuse the KStream, so there's no compatibility issue there, but we can't bump the index for the _first_ repartition topic, or we would break every topology that uses generated repartition topic names already. So, either way, we have to cache something to tell us to do something different on the "first reuse" (i.e., the second use of the KStream).
   
   Since we have to do that anyway, maybe it's fine to just cache the repartition node itself instead of a flag that says "bump the index next time". 
   
   > leaking optimizations into the DSL
   
   I'm on the fence about whether this is an "optimization" or "reasonable behavior". It sort of feels like the latter, and the only reason we needed to introduce the "repartition-collapsing" optimization is that we failed to introduce reasonable behavior from the beginning. Also, my read is that the DSL builder and the optimizer are not cleanly separated right now anyway, and if we ever want to build more optimizations, we'll most likely need to make another pass on both anyway. We're also starting to think about topology evolution (cc @cadonna ), which makes this a less scary prospect, as we can then implement a mechanism to _compatibly_ introduce new optimizations. In other words, I'm not taking a hard stance, but leaning in the direction of doing the more efficient thing than the more pure thing, since we're not currently super pure anyway.
   
   > Other repartition topics
   
   I think we'd better leave it alone for now, implement topology evolution, then migrate to a completely pure and consistent approach.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org