You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@calcite.apache.org by GitBox <gi...@apache.org> on 2021/04/27 13:49:34 UTC

[GitHub] [calcite] hannerwang opened a new pull request #2408: [CALCITE-4512] Optimize when GROUP BY or HAVING alias should be substituted with select item

hannerwang opened a new pull request #2408:
URL: https://github.com/apache/calcite/pull/2408


   for GROUP BY alias substituting:
   1. if the GROUP BY expression is an identifier and the identifier can't be resolved to exact one column in current select scope.
   2. CUBE, ROLLUP, GROUPING SETS, ROW top level argument behave in the same way with GROUP BY expression.
   
   for HAVING alias substituting:
   1. any except AGGREGATE or its arguments . 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [calcite] hannerwang commented on a change in pull request #2408: [CALCITE-4512] Optimize when GROUP BY or HAVING alias should be substituted with select item

Posted by GitBox <gi...@apache.org>.

hannerwang commented on a change in pull request #2408:
URL: https://github.com/apache/calcite/pull/2408#discussion_r621757009



##########
File path: core/src/test/resources/org/apache/calcite/test/SqlToRelConverterTest.xml
##########
@@ -86,16 +86,19 @@ LogicalAggregate(group=[{0}])
             <![CDATA[select empno as d from emp group by d]]>
         </Resource>
     </TestCase>
-    <TestCase name="testGroupByAliasEqualToColumnName">
+    <TestCase name="testGroupByAliasEqualToMultipleColumnName">
         <Resource name="plan">
             <![CDATA[
 LogicalAggregate(group=[{0, 1}])
-  LogicalProject(EMPNO=[$0], DEPTNO=[$1])
-    LogicalTableScan(table=[[CATALOG, SALES, EMP]])
+  LogicalProject(EMPNO=[$0], DEPTNO=[$7])
+    LogicalJoin(condition=[true], joinType=[inner])
+      LogicalTableScan(table=[[CATALOG, SALES, EMP]])
+      LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
 ]]>
         </Resource>
         <Resource name="sql">
-            <![CDATA[select empno, ename as deptno from emp group by empno, deptno]]>
+            <![CDATA[select empno, case when emp.deptno is null then dept.deptno else emp.deptno
+end as deptno from emp, dept group by empno, deptno]]>
         </Resource>

Review comment:
       thanks for reviewing, the strategy that alias will be substituted has changed, so the case should be changed also.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [calcite] hannerwang closed pull request #2408: [CALCITE-4512] Optimize when GROUP BY or HAVING alias should be substituted with select item

Posted by GitBox <gi...@apache.org>.

hannerwang closed pull request #2408:
URL: https://github.com/apache/calcite/pull/2408


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@calcite.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [calcite] chunweilei commented on a change in pull request #2408: [CALCITE-4512] Optimize when GROUP BY or HAVING alias should be substituted with select item

Posted by GitBox <gi...@apache.org>.

chunweilei commented on a change in pull request #2408:
URL: https://github.com/apache/calcite/pull/2408#discussion_r622933328



##########
File path: core/src/main/java/org/apache/calcite/sql/validate/SqlValidatorImpl.java
##########
@@ -6669,9 +6729,95 @@ private SqlNode nthSelectItem(int ordinal, final SqlParserPos pos) {
           break;
         }
       }
-
       return super.visit(literal);
     }
+
+    @Override
+    protected SqlNode visitScoped(SqlCall call) {
+      checkExpandedAsAlias(call);
+      return super.visitScoped(call);
+    }
+
+    void recordExpandingTopNodes(SqlNode root) {
+      List<SqlNode> operands = ((SqlCall) root).getOperandList();
+      for (int i = 0; i < operands.size(); i++) {
+        SqlNode operand = operands.get(i);
+        if (operand.getKind() == SqlKind.ROW) {
+          recordExpandingTopNodes(operand);
+        }
+        if (operand instanceof SqlCall) {
+          expandingTopNodes.put(operand, Boolean.FALSE);
+        } else {
+          expandingTopNodes.put(operand, Boolean.TRUE);
+        }
+      }
+    }
+
+    /*
+     * check if top level node should be expanded as alias.
+     */
+    void checkExpandedAsAlias(SqlNode sqlNode) {

Review comment:
       Yes~~




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [calcite] hannerwang commented on a change in pull request #2408: [CALCITE-4512] Optimize when GROUP BY or HAVING alias should be substituted with select item

Posted by GitBox <gi...@apache.org>.

hannerwang commented on a change in pull request #2408:
URL: https://github.com/apache/calcite/pull/2408#discussion_r621759639



##########
File path: core/src/main/java/org/apache/calcite/sql/validate/SqlValidatorImpl.java
##########
@@ -6669,9 +6729,95 @@ private SqlNode nthSelectItem(int ordinal, final SqlParserPos pos) {
           break;
         }
       }
-
       return super.visit(literal);
     }
+
+    @Override
+    protected SqlNode visitScoped(SqlCall call) {
+      checkExpandedAsAlias(call);
+      return super.visitScoped(call);
+    }
+
+    void recordExpandingTopNodes(SqlNode root) {
+      List<SqlNode> operands = ((SqlCall) root).getOperandList();
+      for (int i = 0; i < operands.size(); i++) {
+        SqlNode operand = operands.get(i);
+        if (operand.getKind() == SqlKind.ROW) {
+          recordExpandingTopNodes(operand);
+        }
+        if (operand instanceof SqlCall) {
+          expandingTopNodes.put(operand, Boolean.FALSE);
+        } else {
+          expandingTopNodes.put(operand, Boolean.TRUE);
+        }
+      }
+    }
+
+    /*
+     * check if top level node should be expanded as alias.
+     */
+    void checkExpandedAsAlias(SqlNode sqlNode) {

Review comment:
       do you mean the 'check' in comment?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [calcite] hannerwang commented on pull request #2408: [CALCITE-4512] Optimize when GROUP BY or HAVING alias should be substituted with select item

Posted by GitBox <gi...@apache.org>.

hannerwang commented on pull request #2408:
URL: https://github.com/apache/calcite/pull/2408#issuecomment-847900410


   > The commit message not in the proper way.
   
   Yes, I think the Jira title can't summarize what the commit has complete, so I leave a comment at first.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [calcite] Aaaaaaron commented on pull request #2408: [CALCITE-4512] Optimize when GROUP BY or HAVING alias should be substituted with select item

Posted by GitBox <gi...@apache.org>.

Aaaaaaron commented on pull request #2408:
URL: https://github.com/apache/calcite/pull/2408#issuecomment-847583880


   The commit message not in the proper way.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [calcite] hannerwang commented on a change in pull request #2408: [CALCITE-4512] Optimize when GROUP BY or HAVING alias should be substituted with select item

Posted by GitBox <gi...@apache.org>.

hannerwang commented on a change in pull request #2408:
URL: https://github.com/apache/calcite/pull/2408#discussion_r623676488



##########
File path: core/src/test/resources/org/apache/calcite/test/SqlToRelConverterTest.xml
##########
@@ -142,14 +145,15 @@ LogicalAggregate(group=[{}], EXPR$0=[COUNT()])
     <TestCase name="testAliasInHaving">
         <Resource name="plan">
             <![CDATA[
-LogicalFilter(condition=[>($0, 1)])
-  LogicalAggregate(group=[{}], E=[COUNT()])
-    LogicalProject(EMPNO=[$0])
-      LogicalTableScan(table=[[CATALOG, SALES, EMP]])
+LogicalProject(ENAME=[$0], E=[$1], D=[$1])
+  LogicalFilter(condition=[>($1, 2)])
+    LogicalAggregate(group=[{0}], E=[COUNT()])
+      LogicalProject(ENAME=[$1], EMPNO=[$0], DEPTNO=[$7])
+        LogicalTableScan(table=[[CATALOG, SALES, EMP]])
 ]]>

Review comment:
       it's the result of reducing COUNT aggregation whose args type has NOT NULL constraint to be count(&ast;). In this case, EMPNO and DEPTNO are integer and not null, so count(empno) and count(deptno) can be reduced as just one count(&ast;), furthermore the count(&ast;) > 1 and count(&ast;) > 2 can be reduced to count(&ast;) > 2.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [calcite] chunweilei commented on a change in pull request #2408: [CALCITE-4512] Optimize when GROUP BY or HAVING alias should be substituted with select item

Posted by GitBox <gi...@apache.org>.

chunweilei commented on a change in pull request #2408:
URL: https://github.com/apache/calcite/pull/2408#discussion_r621751877



##########
File path: core/src/test/java/org/apache/calcite/test/SqlToRelConverterTest.java
##########
@@ -371,7 +372,8 @@ public final Sql sql(String sql) {
   }
 
   @Test void testAliasInHaving() {
-    sql("select count(empno) as e from emp having e > 1")
+    sql("select ename, count(empno) as e, count(deptno) as d from emp"
+        + " group by ename having e > 1 and count(deptno) > 2")

Review comment:
       We'd better add new tests.

##########
File path: core/src/main/java/org/apache/calcite/sql/validate/SqlValidatorImpl.java
##########
@@ -6669,9 +6729,95 @@ private SqlNode nthSelectItem(int ordinal, final SqlParserPos pos) {
           break;
         }
       }
-
       return super.visit(literal);
     }
+
+    @Override
+    protected SqlNode visitScoped(SqlCall call) {
+      checkExpandedAsAlias(call);
+      return super.visitScoped(call);
+    }
+
+    void recordExpandingTopNodes(SqlNode root) {
+      List<SqlNode> operands = ((SqlCall) root).getOperandList();
+      for (int i = 0; i < operands.size(); i++) {
+        SqlNode operand = operands.get(i);
+        if (operand.getKind() == SqlKind.ROW) {
+          recordExpandingTopNodes(operand);
+        }
+        if (operand instanceof SqlCall) {
+          expandingTopNodes.put(operand, Boolean.FALSE);
+        } else {
+          expandingTopNodes.put(operand, Boolean.TRUE);
+        }
+      }
+    }
+
+    /*
+     * check if top level node should be expanded as alias.
+     */
+    void checkExpandedAsAlias(SqlNode sqlNode) {

Review comment:
       check -> Checks?

##########
File path: core/src/test/resources/org/apache/calcite/test/SqlToRelConverterTest.xml
##########
@@ -86,16 +86,19 @@ LogicalAggregate(group=[{0}])
             <![CDATA[select empno as d from emp group by d]]>
         </Resource>
     </TestCase>
-    <TestCase name="testGroupByAliasEqualToColumnName">
+    <TestCase name="testGroupByAliasEqualToMultipleColumnName">
         <Resource name="plan">
             <![CDATA[
 LogicalAggregate(group=[{0, 1}])
-  LogicalProject(EMPNO=[$0], DEPTNO=[$1])
-    LogicalTableScan(table=[[CATALOG, SALES, EMP]])
+  LogicalProject(EMPNO=[$0], DEPTNO=[$7])
+    LogicalJoin(condition=[true], joinType=[inner])
+      LogicalTableScan(table=[[CATALOG, SALES, EMP]])
+      LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
 ]]>
         </Resource>
         <Resource name="sql">
-            <![CDATA[select empno, ename as deptno from emp group by empno, deptno]]>
+            <![CDATA[select empno, case when emp.deptno is null then dept.deptno else emp.deptno
+end as deptno from emp, dept group by empno, deptno]]>
         </Resource>

Review comment:
       Could you add new test cases instead of changing the current one?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [calcite] hannerwang commented on a change in pull request #2408: [CALCITE-4512] Optimize when GROUP BY or HAVING alias should be substituted with select item

Posted by GitBox <gi...@apache.org>.

hannerwang commented on a change in pull request #2408:
URL: https://github.com/apache/calcite/pull/2408#discussion_r623676488



##########
File path: core/src/test/resources/org/apache/calcite/test/SqlToRelConverterTest.xml
##########
@@ -142,14 +145,15 @@ LogicalAggregate(group=[{}], EXPR$0=[COUNT()])
     <TestCase name="testAliasInHaving">
         <Resource name="plan">
             <![CDATA[
-LogicalFilter(condition=[>($0, 1)])
-  LogicalAggregate(group=[{}], E=[COUNT()])
-    LogicalProject(EMPNO=[$0])
-      LogicalTableScan(table=[[CATALOG, SALES, EMP]])
+LogicalProject(ENAME=[$0], E=[$1], D=[$1])
+  LogicalFilter(condition=[>($1, 2)])
+    LogicalAggregate(group=[{0}], E=[COUNT()])
+      LogicalProject(ENAME=[$1], EMPNO=[$0], DEPTNO=[$7])
+        LogicalTableScan(table=[[CATALOG, SALES, EMP]])
 ]]>

Review comment:
       it's the result of reducing COUNT aggregation whose args type has NOT NULL constraint to be count(*). In this case, EMPNO and DEPTNO are integer and not null, so count(empno) and count(deptno) can be reduced as just one count(*), furthermore the count(*) > 1 and count(*) > 2 can be reduced to count(*) > 2.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [calcite] hannerwang commented on a change in pull request #2408: [CALCITE-4512] Optimize when GROUP BY or HAVING alias should be substituted with select item

Posted by GitBox <gi...@apache.org>.

hannerwang commented on a change in pull request #2408:
URL: https://github.com/apache/calcite/pull/2408#discussion_r621760976



##########
File path: core/src/test/java/org/apache/calcite/test/SqlToRelConverterTest.java
##########
@@ -371,7 +372,8 @@ public final Sql sql(String sql) {
   }
 
   @Test void testAliasInHaving() {
-    sql("select count(empno) as e from emp having e > 1")
+    sql("select ename, count(empno) as e, count(deptno) as d from emp"
+        + " group by ename having e > 1 and count(deptno) > 2")

Review comment:
       yes, that conforms SOLID,  but the commit is fixing the existing feature, so I choose to change the obsolete tests.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [calcite] hannerwang commented on a change in pull request #2408: [CALCITE-4512] Optimize when GROUP BY or HAVING alias should be substituted with select item

Posted by GitBox <gi...@apache.org>.

hannerwang commented on a change in pull request #2408:
URL: https://github.com/apache/calcite/pull/2408#discussion_r623536755



##########
File path: core/src/main/java/org/apache/calcite/sql/validate/SqlValidatorImpl.java
##########
@@ -6669,9 +6729,95 @@ private SqlNode nthSelectItem(int ordinal, final SqlParserPos pos) {
           break;
         }
       }
-
       return super.visit(literal);
     }
+
+    @Override
+    protected SqlNode visitScoped(SqlCall call) {
+      checkExpandedAsAlias(call);
+      return super.visitScoped(call);
+    }
+
+    void recordExpandingTopNodes(SqlNode root) {
+      List<SqlNode> operands = ((SqlCall) root).getOperandList();
+      for (int i = 0; i < operands.size(); i++) {
+        SqlNode operand = operands.get(i);
+        if (operand.getKind() == SqlKind.ROW) {
+          recordExpandingTopNodes(operand);
+        }
+        if (operand instanceof SqlCall) {
+          expandingTopNodes.put(operand, Boolean.FALSE);
+        } else {
+          expandingTopNodes.put(operand, Boolean.TRUE);
+        }
+      }
+    }
+
+    /*
+     * check if top level node should be expanded as alias.
+     */
+    void checkExpandedAsAlias(SqlNode sqlNode) {

Review comment:
       thanks, I have fixed it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [calcite] chunweilei commented on a change in pull request #2408: [CALCITE-4512] Optimize when GROUP BY or HAVING alias should be substituted with select item

Posted by GitBox <gi...@apache.org>.

chunweilei commented on a change in pull request #2408:
URL: https://github.com/apache/calcite/pull/2408#discussion_r623551192



##########
File path: core/src/test/resources/org/apache/calcite/test/SqlToRelConverterTest.xml
##########
@@ -142,14 +145,15 @@ LogicalAggregate(group=[{}], EXPR$0=[COUNT()])
     <TestCase name="testAliasInHaving">
         <Resource name="plan">
             <![CDATA[
-LogicalFilter(condition=[>($0, 1)])
-  LogicalAggregate(group=[{}], E=[COUNT()])
-    LogicalProject(EMPNO=[$0])
-      LogicalTableScan(table=[[CATALOG, SALES, EMP]])
+LogicalProject(ENAME=[$0], E=[$1], D=[$1])
+  LogicalFilter(condition=[>($1, 2)])
+    LogicalAggregate(group=[{0}], E=[COUNT()])
+      LogicalProject(ENAME=[$1], EMPNO=[$0], DEPTNO=[$7])
+        LogicalTableScan(table=[[CATALOG, SALES, EMP]])
 ]]>

Review comment:
       The plan seems not right. There are two aggregates in sql, but only one in plan. Do I miss something?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [calcite] hannerwang edited a comment on pull request #2408: [CALCITE-4512] Optimize when GROUP BY or HAVING alias should be substituted with select item

Posted by GitBox <gi...@apache.org>.

hannerwang edited a comment on pull request #2408:
URL: https://github.com/apache/calcite/pull/2408#issuecomment-847900410


   > The commit message not in the proper way.
   
   Yes, I don't think the Jira title can summarize what the commit has complete, so I leave a comment at first.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org