You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@samza.apache.org by GitBox <gi...@apache.org> on 2020/07/02 15:39:04 UTC

[GitHub] [samza] b-slim commented on a change in pull request #1386: [SAMZA-2557] Adding support for nested rows access via dot path.

b-slim commented on a change in pull request #1386:
URL: https://github.com/apache/samza/pull/1386#discussion_r449097622



##########
File path: samza-sql/src/main/java/org/apache/samza/sql/translator/MessageStreamCollector.java
##########
@@ -0,0 +1,214 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.samza.sql.translator;
+
+import java.io.Closeable;
+import java.io.Serializable;
+import java.time.Duration;
+import java.util.ArrayDeque;
+import java.util.Collection;
+import java.util.Deque;
+import java.util.function.Function;
+import org.apache.samza.context.Context;
+import org.apache.samza.operators.KV;
+import org.apache.samza.operators.MessageStream;
+import org.apache.samza.operators.OutputStream;
+import org.apache.samza.operators.functions.AsyncFlatMapFunction;
+import org.apache.samza.operators.functions.FilterFunction;
+import org.apache.samza.operators.functions.FlatMapFunction;
+import org.apache.samza.operators.functions.JoinFunction;
+import org.apache.samza.operators.functions.MapFunction;
+import org.apache.samza.operators.functions.SinkFunction;
+import org.apache.samza.operators.functions.StreamTableJoinFunction;
+import org.apache.samza.operators.windows.Window;
+import org.apache.samza.operators.windows.WindowPane;
+import org.apache.samza.serializers.KVSerde;
+import org.apache.samza.serializers.Serde;
+import org.apache.samza.sql.data.SamzaSqlRelMessage;
+import org.apache.samza.table.Table;
+
+
+/**
+ * Collector of Map and Filter Samza Functions to collect call stack on the top of Remote table.
+ * This Collector will be used by Join operator and trigger it when applying the join function post lookup.
+ *
+ * Note that this is needed because the Remote Table can not expose a proper {@code MessageStream}.
+ * It is a work around to minimize the amount of code changes of the current Query Translator {@link org.apache.samza.sql.translator.QueryTranslator},
+ * But in an ideal world, we should use Calcite planner in conventional way we can combine function when via translation of RelNodes.

Review comment:
       The conventional way of using Calcite is pushing operator toward table scan as much as possible. 
   This PR does that by leveraging Calcite as is with minimal code changes or copy and past of some internal code. 
   By pushing some stuff up case remote table and some stuff down case remote table shows a clear disconnect and not clear design that a someone else beside the author of the work will be able to get it without spending hours stepping into the debugger code. 
   The other question how this will be better than current approach of composing map/filters and fuse it to join operator, don't you think it is more optimal to fuse all the lookup/project/filter within join as one operator ? 
   Adding to that This way also, will enable a case where the join condition will be more than just __key__ = c but will be able to add more conjunctions in the near future.  
   In my opinion this is the most clean way to work around the limitation of the remote table join operator with no major surgery and allowing pushing filters/projects and in the future handle richer join conditions. 
   Again the proper fix will be to adopt Calcite framework Convention pattern where an operator can be pushed inside another operator when going from logical to physical for instance in this case the Join will transformed to a new join node that knows how to translate the project and filters within it self. That is kind of what is happening now but without making this work a major surgery as someone has suggested.
   
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org