You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@calcite.apache.org by GitBox <gi...@apache.org> on 2022/07/20 13:36:02 UTC

[GitHub] [calcite] zabetak commented on a diff in pull request #2813: [CALCITE-5127] Error when executing query with subquery in select lis…

zabetak commented on code in PR #2813:
URL: https://github.com/apache/calcite/pull/2813#discussion_r925409690


##########
core/src/main/java/org/apache/calcite/rel/RelNode.java:
##########
@@ -152,8 +152,27 @@ public interface RelNode extends RelOptNode, Cloneable {
    * expression but also used and therefore not available to parents of this
    * relational expression.
    *
-   * <p>Note: only {@link org.apache.calcite.rel.core.Correlate} should set
-   * variables.
+   * <p> For now, only {@link org.apache.calcite.rel.core.Correlate},
+   * {@link org.apache.calcite.rel.core.Project},
+   * {@link org.apache.calcite.rel.logical.LogicalFilter} could set variables.

Review Comment:
   At the moment `Join` also seems capable of handling variables. There can be even physical joins operators that rely on setting variables not to mention also the case where we have subqueries in the `ON` condition. I think it would be better to remove this listing altogether from the javadoc to ease also evolution of the API in the future.



##########
core/src/main/java/org/apache/calcite/rel/RelNode.java:
##########
@@ -152,8 +152,27 @@ public interface RelNode extends RelOptNode, Cloneable {
    * expression but also used and therefore not available to parents of this
    * relational expression.
    *
-   * <p>Note: only {@link org.apache.calcite.rel.core.Correlate} should set
-   * variables.
+   * <p> For now, only {@link org.apache.calcite.rel.core.Correlate},
+   * {@link org.apache.calcite.rel.core.Project},
+   * {@link org.apache.calcite.rel.logical.LogicalFilter} could set variables.
+   *
+   * <p> {@link org.apache.calcite.rel.core.Project} and
+   * {@link org.apache.calcite.rel.logical.LogicalFilter} are designed to carry
+   * the variables just before {@link org.apache.calcite.rel.rules.SubQueryRemoveRule}.
+   * After that, the sub-query expressions will be expanded with
+   * {@link org.apache.calcite.rel.core.Correlate} or
+   * {@link org.apache.calcite.rel.core.Join} depending on whether it's
+   * correlated or not.

Review Comment:
   It is not completely clear how someone reading this Javadoc can use this information. Moreover, I don't know if here is the best place to put it since it refers to specific relational expressions.
   
   I think the javadoc of `SubQueryRemoveRule` covers already a bit part of what is written here. We could possibly omit this paragraph as well. If necessary we could add clarifications in `SubQueryRemoveRule`, `Project`, `Filter` etc. but I don't find it strictly necessary.



##########
core/src/main/java/org/apache/calcite/tools/RelBuilder.java:
##########
@@ -1908,9 +1927,11 @@ private RelBuilder project_(
       fieldNameList.add(null);
     }
 
+    // Do not merge projection when top projection has correlation variables
     bloat:
     if (frame.rel instanceof Project
-        && config.bloat() >= 0) {
+        && config.bloat() >= 0
+        && variables.isEmpty()) {
       final Project project = (Project) frame.rel;

Review Comment:
   The changes here make me think that probably we need to take some action in `ProjectMergeRule` to avoid invalid plans. We don't necessarily need to treat this now but we should log a follow up JIRA.



##########
core/src/main/java/org/apache/calcite/rel/rules/ProjectWindowTransposeRule.java:
##########
@@ -97,9 +99,11 @@ public ProjectWindowTransposeRule(RelBuilderFactory relBuilderFactory) {
       builder.add(relDataTypeField);
     }
 
+    Preconditions.checkArgument(project.getVariablesSet().isEmpty(),
+        "Correlated project should be decorrelated before.");

Review Comment:
   Is there a reason to throw? If the transformation is not valid when there are correlation variables then we should simply skip the transformation. If it is valid then we should propagate them.
   
   I see that you mentioned that you are going to do this change under the JIRA but I didn't completely understand what you had in mind.
   
   The same comment applies to all rules with the same changes. 



##########
core/src/main/java/org/apache/calcite/rel/RelNode.java:
##########
@@ -152,8 +152,27 @@ public interface RelNode extends RelOptNode, Cloneable {
    * expression but also used and therefore not available to parents of this
    * relational expression.
    *
-   * <p>Note: only {@link org.apache.calcite.rel.core.Correlate} should set
-   * variables.
+   * <p> For now, only {@link org.apache.calcite.rel.core.Correlate},
+   * {@link org.apache.calcite.rel.core.Project},
+   * {@link org.apache.calcite.rel.logical.LogicalFilter} could set variables.
+   *
+   * <p> {@link org.apache.calcite.rel.core.Project} and
+   * {@link org.apache.calcite.rel.logical.LogicalFilter} are designed to carry
+   * the variables just before {@link org.apache.calcite.rel.rules.SubQueryRemoveRule}.
+   * After that, the sub-query expressions will be expanded with
+   * {@link org.apache.calcite.rel.core.Correlate} or
+   * {@link org.apache.calcite.rel.core.Join} depending on whether it's
+   * correlated or not.
+   *
+   * <p> The reason why we put the variables in
+   * {@link org.apache.calcite.rel.logical.LogicalFilter} instead of
+   * {@link org.apache.calcite.rel.core.Filter} is that we tried to not expose
+   * the variables to the optimization/physical phase for Filter.
+   * However, as discussed in
+   * <a href="https://issues.apache.org/jira/browse/CALCITE-5127">[CALCITE-5127]</a>,
+   * we now put the variables in {@link org.apache.calcite.rel.core.Project} instead
+   * of {@link org.apache.calcite.rel.logical.LogicalProject} because some downstream
+   * projects used their own logical RelNode which extends from the core RelNode.

Review Comment:
   These are implementation details specific to `Project` and `LogicalFilter`. The purpose of the Javadoc is more about explaining what the method does and less why we implemented things in a certain way.
   
   We can do one of the following about these details:
   1. move them to the respective classes as regular (non-javadoc) comments;
   2. move them in the commit message;
   3. remove them completely.
   
   Someone interested about the decision to put the variables in `Project` instead of `LogicalProject` (or elsewhere) can run back to the discussion in the JIRA or PR and get the complete context so I am leaning more towards options 2 and 3.



##########
core/src/test/java/org/apache/calcite/plan/RelWriterTest.java:
##########
@@ -998,6 +999,21 @@ void testCorrelateQuery(SqlExplainFormat format) {
         .assertThatPlan(isLinux(expected));
   }
 
+  @Test void testProjectionWithCorrelationVaribles() {
+    final Function<RelBuilder, RelNode> relFn = b -> b.scan("EMP")
+        .project(
+            ImmutableList.of(b.field("ENAME")),
+            ImmutableList.of("ename"),
+            true,
+            ImmutableSet.of(b.getCluster().createCorrel()))
+        .build();
+
+    final String expected = "LogicalProject(variablesSet=[[$cor0]], ename=[$1])\n"
+        + "  LogicalTableScan(table=[[scott, EMP]])\n";

Review Comment:
   This seems like an invalid plan since the correlation is not used. It would be better to make this more realistic to correspond to an actual SQL query.



##########
core/src/main/java/org/apache/calcite/rel/core/Project.java:
##########
@@ -81,6 +85,8 @@ public abstract class Project extends SingleRel implements Hintable {
    * @param input    Input relational expression
    * @param projects List of expressions for the input columns
    * @param rowType  Output row type
+   * @param variableSet Correlation variables set by this relational expression
+   *                    to be used by nested expressions
    */
   @SuppressWarnings("method.invocation.invalid")
   protected Project(

Review Comment:
   This looks like a breaking change unless I am missing something. Please retain the old constructor and mark it as deprecated.



##########
core/src/main/java/org/apache/calcite/tools/RelBuilder.java:
##########
@@ -1817,7 +1817,24 @@ public RelBuilder project(Iterable<? extends RexNode> nodes,
    */
   public RelBuilder project(Iterable<? extends RexNode> nodes,
       Iterable<? extends @Nullable String> fieldNames, boolean force) {
-    return project_(nodes, fieldNames, ImmutableList.of(), force);
+    return project(nodes, fieldNames, force, ImmutableSet.of());
+  }
+
+  /**
+   * The same with {@link #project(Iterable, Iterable, boolean)}, with additional
+   * variablesSet param.
+   *
+   * @param nodes Expressions
+   * @param fieldNames Suggested field names
+   * @param force create project even if it is identity
+   * @param variablesSet Correlating variables that are set when reading a row
+   *                     from the input, and which may be referenced from the
+   *                     projection expressions
+   */
+  public RelBuilder project(Iterable<? extends RexNode> nodes,
+      Iterable<? extends @Nullable String> fieldNames, boolean force,
+      Iterable<CorrelationId> variablesSet) {
+    return project_(nodes, fieldNames, ImmutableList.of(), force, variablesSet);

Review Comment:
   Can we avoid introducing the new APIs in `RelBuilder` as part of this PR? Probably we need more tests and discussion around these so it would be more efficient to treat it separately in a follow up if possible.



##########
core/src/test/java/org/apache/calcite/test/JdbcTest.java:
##########
@@ -2122,6 +2122,28 @@ void checkMultisetQueryWithSingleColumn() {
             "EXPR$0=2");
   }
 
+  @Test void testSelectInsideFromAndList() {
+    CalciteAssert.that()
+        .query("SELECT ARRAY(SELECT s.x)\n"
+            + "FROM (SELECT 1 as x) s")
+        .returnsValue("[{1}]");
+
+    CalciteAssert.that()
+        .query("SELECT ARRAY(SELECT s.x)\n"
+            + "FROM (SELECT ARRAY[1,2,3] as x) s")
+        .returnsCount(1);
+
+    CalciteAssert.that()
+        .query("SELECT ARRAY(SELECT * FROM UNNEST(s.x) y)\n"
+            + "FROM (SELECT ARRAY[1,2,3] as x) s")
+        .returnsCount(1);
+
+    CalciteAssert.that()
+        .query("SELECT (SELECT CARDINALITY(s.x) LIMIT 1)\n"
+            + "FROM (SELECT ARRAY[1,2,3] as x) s")
+        .returnsUnordered("EXPR$0=3");
+  }
+

Review Comment:
   This is not an appropriate place for these tests since they have nothing to do with JDBC. Maybe a better place to put these is `sub-query.iq`. 
   
   What exactly do we attempt to verify with these tests? 
   
   If we only want to validate that they run and produce the correct results then maybe it would help to use the same queries with those in `SqlToRelConverterTest`. On the contrary if these lead to significantly different plans then maybe we need to add those as well in `SqlToRelConverterTest`. Please clarify the intention.



##########
core/src/test/java/org/apache/calcite/test/SqlToRelConverterTest.java:
##########
@@ -3285,6 +3285,30 @@ void checkCorrelatedMapSubQuery(boolean expand) {
     sql(sql).withDecorrelate(true).ok();
   }
 
+  @Test void testCorrelationInProjection1() {
+    final String sql = "select array(select e.deptno) from emp e";
+    sql(sql).withExpand(false).withDecorrelate(false).ok();
+  }
+
+  @Test void testCorrelationInProjection2() {
+    final String sql = "select array(select e.deptno)\n"
+        + "from (select deptno, ename from emp) e";
+    sql(sql).withExpand(false).withDecorrelate(false).ok();
+  }
+
+  @Test void testCorrelationInProjection3() {
+    final String sql = "select cardinality(array(select e.deptno)), array(select e.ename)[0]\n"
+        + "from (select deptno, ename from emp) e";
+    sql(sql).withExpand(false).withDecorrelate(false).ok();
+  }
+
+  @Test void testCorrelationInProjection4() {
+    final String sql = "select cardinality(arr) from"
+        + "(select array(select e.deptno) arr\n"
+        + "from (select deptno, ename from emp) e)";
+    sql(sql).withExpand(false).withDecorrelate(false).ok();
+  }
+

Review Comment:
   The numeric suffix does not help much to understand the purpose/difference of each test. If possible add some comments or rename the test to better indicate its purpose.



##########
elasticsearch/src/main/java/org/apache/calcite/adapter/elasticsearch/ElasticsearchRules.java:
##########
@@ -295,6 +297,8 @@ protected ElasticsearchProjectRule(Config config) {
 
     @Override public RelNode convert(RelNode relNode) {
       final LogicalProject project = (LogicalProject) relNode;
+      Preconditions.checkArgument(project.getVariablesSet().isEmpty(),
+          "ElasticsearchProject does not allow variables");

Review Comment:
   As I wrote previously maybe it is better to not fire the rule instead of throwing when there are correlations variables.
   
   You cannot control the order that rules are applied or how an end-user is putting them together. It wouldn't be nice to have the optimizer crash while later on it would produce a valid plan.
   
   Consider for example that `ElasticSearchProjectRule` is used together with `SubQueryRemoveRule` and for some reason the former applies first. In this case the app would crash while it is still possible to come up with a valid plan.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@calcite.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org