You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by GitBox <gi...@apache.org> on 2022/09/01 18:33:07 UTC

[GitHub] [kafka] guozhangwang commented on a diff in pull request #12555: Optimize self-join

guozhangwang commented on code in PR #12555:
URL: https://github.com/apache/kafka/pull/12555#discussion_r960983275


##########
streams/src/main/java/org/apache/kafka/streams/kstream/internals/InternalStreamsBuilder.java:
##########
@@ -270,16 +272,20 @@ private void maybeAddNodeForOptimizationMetadata(final GraphNode node) {
 
     // use this method for testing only
     public void buildAndOptimizeTopology() {
-        buildAndOptimizeTopology(false);
+        buildAndOptimizeTopology(false, false);
     }
 
-    public void buildAndOptimizeTopology(final boolean optimizeTopology) {
+    public void buildAndOptimizeTopology(
+        final boolean optimizeTopology, final boolean optimizeSelfJoin) {
 
         mergeDuplicateSourceNodes();
         if (optimizeTopology) {
             LOG.debug("Optimizing the Kafka Streams graph for repartition nodes");
             optimizeKTableSourceTopics();
             maybeOptimizeRepartitionOperations();
+            if (optimizeSelfJoin) {

Review Comment:
   Hmm.. for case 2), the logical plan would be like:
   
   ```
   root -> source1
   root -> source2
   source1 -> join-window1
   source2 -> join-window2
   source1,source2 -> stream-join
   ```
   
   Is that right?
   
   For case 3), since we are not re-assigning the result of `mapValues` to stream` it should just be a no-op right? The logical plan would be like:
   
   ```
   root -> source1
   root -> source2
   source1 -> join-window1
   source1 -> map-values1
   source2 -> join-window2
   source1,source2 -> stream-join
   ```
   
   Should that be considered optimizable as well?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org