You are viewing a plain text version of this content. The canonical link for it is here.
Posted to gitbox@hive.apache.org by GitBox <gi...@apache.org> on 2020/08/27 10:06:40 UTC

[GitHub] [hive] kasakrisz opened a new pull request #1437: HIVE-24081: Enable pre-materializing CTEs referenced in scalar subqueries

kasakrisz opened a new pull request #1437:
URL: https://github.com/apache/hive/pull/1437


   ### What changes were proposed in this pull request?
   * Do phase 1 parsing of subquery expressions in order to count CTE references in those subqueries
   * Add a config to materialize CTEs with aggregate output only
   
   
   ### Why are the changes needed?
   Improve performance of complex queries referencing the same fully aggregate CTE more than one times.
   
   ### Does this PR introduce _any_ user-facing change?
   Adds a new config into HiveConf: `hive.optimize.cte.materialize.full.aggregate.only`.
   Prior this patch if `hive.optimize.cte.materialize.threshold` was higher than -1 all non-subquery CTEs were materialized if they were referenced more times than the threshold. This patch limits this to fully aggregate CTEs only by default. The original behavior can restored by setting `hive.optimize.cte.materialize.full.aggregate.only` to false.
   
   ### How was this patch tested?
   * New q tests were added.
   ```
   mvn test -Dtest.output.overwrite -DskipSparkTests -Dtest=TestMiniLlapLocalCliDriver -Dqfile=cte_mat_6.q -pl itests/qtest -Pitests
   ```
   * Run query14 with `set hive.optimize.cte.materialize.threshold=3;`
   ```
   mvn test -Dtest.output.overwrite -DskipSparkTests -Dtest=TestTezPerfCliDriver -Dqfile=query14.q -pl itests/qtest -Pitests
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] kasakrisz commented on a change in pull request #1437: HIVE-24081: Enable pre-materializing CTEs referenced in scalar subqueries

Posted by GitBox <gi...@apache.org>.
kasakrisz commented on a change in pull request #1437:
URL: https://github.com/apache/hive/pull/1437#discussion_r479992094



##########
File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
##########
@@ -2591,6 +2591,10 @@ private static void populateLlapDaemonVarsSet(Set<String> llapDaemonVarsSetLocal
     HIVE_CTE_MATERIALIZE_THRESHOLD("hive.optimize.cte.materialize.threshold", -1,

Review comment:
       done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] kasakrisz commented on a change in pull request #1437: HIVE-24081: Enable pre-materializing CTEs referenced in scalar subqueries

Posted by GitBox <gi...@apache.org>.
kasakrisz commented on a change in pull request #1437:
URL: https://github.com/apache/hive/pull/1437#discussion_r479970570



##########
File path: ql/src/test/queries/clientpositive/cte_mat_6.q
##########
@@ -0,0 +1,81 @@
+set hive.optimize.cte.materialize.threshold=1;
+
+create table t0(col0 int);
+
+insert into t0(col0) values
+(1),(2),
+(100),(100),(100),
+(200),(200);
+
+-- CTE is referenced from scalar subquery in the select clause
+explain
+with cte as (select count(*) as small_count from t0 where col0 < 10)
+select t0.col0, (select small_count from cte)
+from t0
+order by t0.col0;
+
+with cte as (select count(*) as small_count from t0 where col0 < 10)
+select t0.col0, (select small_count from cte)
+from t0
+order by t0.col0;
+
+-- disable cte materialization
+set hive.optimize.cte.materialize.threshold=-1;
+
+explain
+with cte as (select count(*) as small_count from t0 where col0 < 10)
+select t0.col0, (select small_count from cte)
+from t0
+order by t0.col0;
+
+
+with cte as (select count(*) as small_count from t0 where col0 < 10)
+select t0.col0, (select small_count from cte)
+from t0
+order by t0.col0;
+
+
+-- enable cte materialization
+set hive.optimize.cte.materialize.threshold=1;
+
+-- CTE is referenced from scalar subquery in the where clause
+explain
+with cte as (select count(*) as small_count from t0 where col0 < 10)
+select t0.col0
+from t0
+where t0.col0 > (select small_count from cte)
+order by t0.col0;
+
+with cte as (select count(*) as small_count from t0 where col0 < 10)
+select t0.col0
+from t0
+where t0.col0 > (select small_count from cte)
+order by t0.col0;
+
+-- CTE is referenced from scalar subquery in the having clause
+explain
+with cte as (select count(*) as small_count from t0 where col0 < 10)
+select t0.col0, count(*)
+from t0
+group by col0
+having count(*) > (select small_count from cte)
+order by t0.col0;
+
+with cte as (select count(*) as small_count from t0 where col0 < 10)
+select t0.col0, count(*)
+from t0
+group by col0
+having count(*) > (select small_count from cte)
+order by t0.col0;
+
+-- mix full aggregate and non-full aggregate ctes
+explain
+with cte1 as (select col0 as k1 from t0 where col0 = '5'),
+     cte2 as (select count(*) as all_count from t0),
+     cte3 as (select col0 as k3, col0 + col0 as k3_2x, count(*) as key_count from t0 group by col0)
+select t0.col0, count(*)
+from t0
+join cte1 on t0.col0 = cte1.k1
+join cte3 on t0.col0 = cte3.k3
+group by col0
+having count(*) > (select all_count from cte2)

Review comment:
       I added a check for CTAS and will update the PR soon.
   For `create materialize view` CTE materialization is disabled here 
   `SemanticAnalyzer.genResolvedParseTree`
   ```
       // 5. Resolve Parse Tree
       // Materialization is allowed if it is not a view definition
       getMetaData(qb, createVwDesc == null && !forViewCreation);
   ```
   https://github.com/apache/hive/blob/54aff33d8e1d659d295e1f53b88aad91ba8cc23e/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java#L12355




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] kasakrisz commented on a change in pull request #1437: HIVE-24081: Enable pre-materializing CTEs referenced in scalar subqueries

Posted by GitBox <gi...@apache.org>.
kasakrisz commented on a change in pull request #1437:
URL: https://github.com/apache/hive/pull/1437#discussion_r479992802



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/QB.java
##########
@@ -457,4 +469,17 @@ public boolean hasTableDefined() {
     return !(aliases.size() == 1 && aliases.get(0).equals(SemanticAnalyzer.DUMMY_TABLE));
   }
 
+  public void addSubqExprAlias(ASTNode expressionTree, SemanticAnalyzer semanticAnalyzer) throws SemanticException {
+    String alias = "__subexpr" + subQueryExpressionAliasCounter++;
+
+    // Recursively do the first phase of semantic analysis for the subquery
+    QBExpr qbexpr = new QBExpr(alias);
+
+    ASTNode subqref = (ASTNode) expressionTree.getChild(1);
+    semanticAnalyzer.doPhase1QBExpr(subqref, qbexpr, getId(), alias, isInsideView());

Review comment:
       This step parses the subquery only. It is necessary to collect references to the CTEs in subquery expressions.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] kasakrisz commented on a change in pull request #1437: HIVE-24081: Enable pre-materializing CTEs referenced in scalar subqueries

Posted by GitBox <gi...@apache.org>.
kasakrisz commented on a change in pull request #1437:
URL: https://github.com/apache/hive/pull/1437#discussion_r479994280



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/QBParseInfo.java
##########
@@ -677,6 +685,33 @@ public void setNoScanAnalyzeCommand(boolean isNoScanAnalyzeCommand) {
   public boolean hasInsertTables() {
     return this.insertIntoTables.size() > 0 || this.insertOverwriteTables.size() > 0;
   }
+
+  public boolean isFullyAggregate() throws SemanticException {

Review comment:
       Added javadocs. I think it can be non-static until it is used from another location. It uses the `destToSelExpr` field located in `QBParseInfo`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] kasakrisz commented on a change in pull request #1437: HIVE-24081: Enable pre-materializing CTEs referenced in scalar subqueries

Posted by GitBox <gi...@apache.org>.
kasakrisz commented on a change in pull request #1437:
URL: https://github.com/apache/hive/pull/1437#discussion_r479970570



##########
File path: ql/src/test/queries/clientpositive/cte_mat_6.q
##########
@@ -0,0 +1,81 @@
+set hive.optimize.cte.materialize.threshold=1;
+
+create table t0(col0 int);
+
+insert into t0(col0) values
+(1),(2),
+(100),(100),(100),
+(200),(200);
+
+-- CTE is referenced from scalar subquery in the select clause
+explain
+with cte as (select count(*) as small_count from t0 where col0 < 10)
+select t0.col0, (select small_count from cte)
+from t0
+order by t0.col0;
+
+with cte as (select count(*) as small_count from t0 where col0 < 10)
+select t0.col0, (select small_count from cte)
+from t0
+order by t0.col0;
+
+-- disable cte materialization
+set hive.optimize.cte.materialize.threshold=-1;
+
+explain
+with cte as (select count(*) as small_count from t0 where col0 < 10)
+select t0.col0, (select small_count from cte)
+from t0
+order by t0.col0;
+
+
+with cte as (select count(*) as small_count from t0 where col0 < 10)
+select t0.col0, (select small_count from cte)
+from t0
+order by t0.col0;
+
+
+-- enable cte materialization
+set hive.optimize.cte.materialize.threshold=1;
+
+-- CTE is referenced from scalar subquery in the where clause
+explain
+with cte as (select count(*) as small_count from t0 where col0 < 10)
+select t0.col0
+from t0
+where t0.col0 > (select small_count from cte)
+order by t0.col0;
+
+with cte as (select count(*) as small_count from t0 where col0 < 10)
+select t0.col0
+from t0
+where t0.col0 > (select small_count from cte)
+order by t0.col0;
+
+-- CTE is referenced from scalar subquery in the having clause
+explain
+with cte as (select count(*) as small_count from t0 where col0 < 10)
+select t0.col0, count(*)
+from t0
+group by col0
+having count(*) > (select small_count from cte)
+order by t0.col0;
+
+with cte as (select count(*) as small_count from t0 where col0 < 10)
+select t0.col0, count(*)
+from t0
+group by col0
+having count(*) > (select small_count from cte)
+order by t0.col0;
+
+-- mix full aggregate and non-full aggregate ctes
+explain
+with cte1 as (select col0 as k1 from t0 where col0 = '5'),
+     cte2 as (select count(*) as all_count from t0),
+     cte3 as (select col0 as k3, col0 + col0 as k3_2x, count(*) as key_count from t0 group by col0)
+select t0.col0, count(*)
+from t0
+join cte1 on t0.col0 = cte1.k1
+join cte3 on t0.col0 = cte3.k3
+group by col0
+having count(*) > (select all_count from cte2)

Review comment:
       I added a check for CTAS.
   
   For `create materialize view` CTE materialization is disabled here 
   `SemanticAnalyzer.genResolvedParseTree`
   ```
       // 5. Resolve Parse Tree
       // Materialization is allowed if it is not a view definition
       getMetaData(qb, createVwDesc == null && !forViewCreation);
   ```
   https://github.com/apache/hive/blob/54aff33d8e1d659d295e1f53b88aad91ba8cc23e/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java#L12355




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] jcamachor merged pull request #1437: HIVE-24081: Enable pre-materializing CTEs referenced in scalar subqueries

Posted by GitBox <gi...@apache.org>.
jcamachor merged pull request #1437:
URL: https://github.com/apache/hive/pull/1437


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] jcamachor commented on a change in pull request #1437: HIVE-24081: Enable pre-materializing CTEs referenced in scalar subqueries

Posted by GitBox <gi...@apache.org>.
jcamachor commented on a change in pull request #1437:
URL: https://github.com/apache/hive/pull/1437#discussion_r481220742



##########
File path: ql/src/test/queries/clientpositive/cte_mat_6.q
##########
@@ -0,0 +1,81 @@
+set hive.optimize.cte.materialize.threshold=1;
+
+create table t0(col0 int);
+
+insert into t0(col0) values
+(1),(2),
+(100),(100),(100),
+(200),(200);
+
+-- CTE is referenced from scalar subquery in the select clause
+explain
+with cte as (select count(*) as small_count from t0 where col0 < 10)
+select t0.col0, (select small_count from cte)
+from t0
+order by t0.col0;
+
+with cte as (select count(*) as small_count from t0 where col0 < 10)
+select t0.col0, (select small_count from cte)
+from t0
+order by t0.col0;
+
+-- disable cte materialization
+set hive.optimize.cte.materialize.threshold=-1;
+
+explain
+with cte as (select count(*) as small_count from t0 where col0 < 10)
+select t0.col0, (select small_count from cte)
+from t0
+order by t0.col0;
+
+
+with cte as (select count(*) as small_count from t0 where col0 < 10)
+select t0.col0, (select small_count from cte)
+from t0
+order by t0.col0;
+
+
+-- enable cte materialization
+set hive.optimize.cte.materialize.threshold=1;
+
+-- CTE is referenced from scalar subquery in the where clause
+explain
+with cte as (select count(*) as small_count from t0 where col0 < 10)
+select t0.col0
+from t0
+where t0.col0 > (select small_count from cte)
+order by t0.col0;
+
+with cte as (select count(*) as small_count from t0 where col0 < 10)
+select t0.col0
+from t0
+where t0.col0 > (select small_count from cte)
+order by t0.col0;
+
+-- CTE is referenced from scalar subquery in the having clause
+explain
+with cte as (select count(*) as small_count from t0 where col0 < 10)
+select t0.col0, count(*)
+from t0
+group by col0
+having count(*) > (select small_count from cte)
+order by t0.col0;
+
+with cte as (select count(*) as small_count from t0 where col0 < 10)
+select t0.col0, count(*)
+from t0
+group by col0
+having count(*) > (select small_count from cte)
+order by t0.col0;
+
+-- mix full aggregate and non-full aggregate ctes
+explain
+with cte1 as (select col0 as k1 from t0 where col0 = '5'),
+     cte2 as (select count(*) as all_count from t0),
+     cte3 as (select col0 as k3, col0 + col0 as k3_2x, count(*) as key_count from t0 group by col0)
+select t0.col0, count(*)
+from t0
+join cte1 on t0.col0 = cte1.k1
+join cte3 on t0.col0 = cte3.k3
+group by col0
+having count(*) > (select all_count from cte2)

Review comment:
       OK, I think it is fine to leave it then.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] kasakrisz commented on a change in pull request #1437: HIVE-24081: Enable pre-materializing CTEs referenced in scalar subqueries

Posted by GitBox <gi...@apache.org>.
kasakrisz commented on a change in pull request #1437:
URL: https://github.com/apache/hive/pull/1437#discussion_r480078957



##########
File path: ql/src/test/queries/clientpositive/cte_mat_6.q
##########
@@ -0,0 +1,81 @@
+set hive.optimize.cte.materialize.threshold=1;
+
+create table t0(col0 int);
+
+insert into t0(col0) values
+(1),(2),
+(100),(100),(100),
+(200),(200);
+
+-- CTE is referenced from scalar subquery in the select clause
+explain
+with cte as (select count(*) as small_count from t0 where col0 < 10)
+select t0.col0, (select small_count from cte)
+from t0
+order by t0.col0;
+
+with cte as (select count(*) as small_count from t0 where col0 < 10)
+select t0.col0, (select small_count from cte)
+from t0
+order by t0.col0;
+
+-- disable cte materialization
+set hive.optimize.cte.materialize.threshold=-1;
+
+explain
+with cte as (select count(*) as small_count from t0 where col0 < 10)
+select t0.col0, (select small_count from cte)
+from t0
+order by t0.col0;
+
+
+with cte as (select count(*) as small_count from t0 where col0 < 10)
+select t0.col0, (select small_count from cte)
+from t0
+order by t0.col0;
+
+
+-- enable cte materialization
+set hive.optimize.cte.materialize.threshold=1;
+
+-- CTE is referenced from scalar subquery in the where clause
+explain
+with cte as (select count(*) as small_count from t0 where col0 < 10)
+select t0.col0
+from t0
+where t0.col0 > (select small_count from cte)
+order by t0.col0;
+
+with cte as (select count(*) as small_count from t0 where col0 < 10)
+select t0.col0
+from t0
+where t0.col0 > (select small_count from cte)
+order by t0.col0;
+
+-- CTE is referenced from scalar subquery in the having clause
+explain
+with cte as (select count(*) as small_count from t0 where col0 < 10)
+select t0.col0, count(*)
+from t0
+group by col0
+having count(*) > (select small_count from cte)
+order by t0.col0;
+
+with cte as (select count(*) as small_count from t0 where col0 < 10)
+select t0.col0, count(*)
+from t0
+group by col0
+having count(*) > (select small_count from cte)
+order by t0.col0;
+
+-- mix full aggregate and non-full aggregate ctes
+explain
+with cte1 as (select col0 as k1 from t0 where col0 = '5'),
+     cte2 as (select count(*) as all_count from t0),
+     cte3 as (select col0 as k3, col0 + col0 as k3_2x, count(*) as key_count from t0 group by col0)
+select t0.col0, count(*)
+from t0
+join cte1 on t0.col0 = cte1.k1
+join cte3 on t0.col0 = cte3.k3
+group by col0
+having count(*) > (select all_count from cte2)

Review comment:
       @jcamachor 
   I just found that CTE materialization was enabled for CTAS in the past or at least it was tested. Please see
   https://github.com/apache/hive/blob/master/ql/src/test/queries/clientpositive/cte_4.q
   Should we disable it anyway?
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] jcamachor commented on a change in pull request #1437: HIVE-24081: Enable pre-materializing CTEs referenced in scalar subqueries

Posted by GitBox <gi...@apache.org>.
jcamachor commented on a change in pull request #1437:
URL: https://github.com/apache/hive/pull/1437#discussion_r481221210



##########
File path: ql/src/test/results/clientpositive/llap/cte_4.q.out
##########
@@ -106,23 +106,17 @@ PREHOOK: query: create table s2 as
 with q1 as ( select key from src where key = '4')
 select * from q1
 PREHOOK: type: CREATETABLE_AS_SELECT
-PREHOOK: Input: default@q1

Review comment:
       Expected?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] jcamachor commented on a change in pull request #1437: HIVE-24081: Enable pre-materializing CTEs referenced in scalar subqueries

Posted by GitBox <gi...@apache.org>.
jcamachor commented on a change in pull request #1437:
URL: https://github.com/apache/hive/pull/1437#discussion_r479498296



##########
File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
##########
@@ -2591,6 +2591,10 @@ private static void populateLlapDaemonVarsSet(Set<String> llapDaemonVarsSetLocal
     HIVE_CTE_MATERIALIZE_THRESHOLD("hive.optimize.cte.materialize.threshold", -1,

Review comment:
       Should we set the default to `2` now that it is only triggered in very specific cases?

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/QBParseInfo.java
##########
@@ -677,6 +685,33 @@ public void setNoScanAnalyzeCommand(boolean isNoScanAnalyzeCommand) {
   public boolean hasInsertTables() {
     return this.insertIntoTables.size() > 0 || this.insertOverwriteTables.size() > 0;
   }
+
+  public boolean isFullyAggregate() throws SemanticException {

Review comment:
       Although the method is evident, could we add a comment?
   
   Should this be static since it is a utility method that can be used beyond this scope?

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/QB.java
##########
@@ -457,4 +469,17 @@ public boolean hasTableDefined() {
     return !(aliases.size() == 1 && aliases.get(0).equals(SemanticAnalyzer.DUMMY_TABLE));
   }
 
+  public void addSubqExprAlias(ASTNode expressionTree, SemanticAnalyzer semanticAnalyzer) throws SemanticException {
+    String alias = "__subexpr" + subQueryExpressionAliasCounter++;
+
+    // Recursively do the first phase of semantic analysis for the subquery
+    QBExpr qbexpr = new QBExpr(alias);
+
+    ASTNode subqref = (ASTNode) expressionTree.getChild(1);
+    semanticAnalyzer.doPhase1QBExpr(subqref, qbexpr, getId(), alias, isInsideView());

Review comment:
       Trying to understand this step. Does this lead to parsing the same subquery multiple times?

##########
File path: ql/src/test/queries/clientpositive/cte_mat_6.q
##########
@@ -0,0 +1,81 @@
+set hive.optimize.cte.materialize.threshold=1;
+
+create table t0(col0 int);
+
+insert into t0(col0) values
+(1),(2),
+(100),(100),(100),
+(200),(200);
+
+-- CTE is referenced from scalar subquery in the select clause
+explain
+with cte as (select count(*) as small_count from t0 where col0 < 10)
+select t0.col0, (select small_count from cte)
+from t0
+order by t0.col0;
+
+with cte as (select count(*) as small_count from t0 where col0 < 10)
+select t0.col0, (select small_count from cte)
+from t0
+order by t0.col0;
+
+-- disable cte materialization
+set hive.optimize.cte.materialize.threshold=-1;
+
+explain
+with cte as (select count(*) as small_count from t0 where col0 < 10)
+select t0.col0, (select small_count from cte)
+from t0
+order by t0.col0;
+
+
+with cte as (select count(*) as small_count from t0 where col0 < 10)
+select t0.col0, (select small_count from cte)
+from t0
+order by t0.col0;
+
+
+-- enable cte materialization
+set hive.optimize.cte.materialize.threshold=1;
+
+-- CTE is referenced from scalar subquery in the where clause
+explain
+with cte as (select count(*) as small_count from t0 where col0 < 10)
+select t0.col0
+from t0
+where t0.col0 > (select small_count from cte)
+order by t0.col0;
+
+with cte as (select count(*) as small_count from t0 where col0 < 10)
+select t0.col0
+from t0
+where t0.col0 > (select small_count from cte)
+order by t0.col0;
+
+-- CTE is referenced from scalar subquery in the having clause
+explain
+with cte as (select count(*) as small_count from t0 where col0 < 10)
+select t0.col0, count(*)
+from t0
+group by col0
+having count(*) > (select small_count from cte)
+order by t0.col0;
+
+with cte as (select count(*) as small_count from t0 where col0 < 10)
+select t0.col0, count(*)
+from t0
+group by col0
+having count(*) > (select small_count from cte)
+order by t0.col0;
+
+-- mix full aggregate and non-full aggregate ctes
+explain
+with cte1 as (select col0 as k1 from t0 where col0 = '5'),
+     cte2 as (select count(*) as all_count from t0),
+     cte3 as (select col0 as k3, col0 + col0 as k3_2x, count(*) as key_count from t0 group by col0)
+select t0.col0, count(*)
+from t0
+join cte1 on t0.col0 = cte1.k1
+join cte3 on t0.col0 = cte3.k3
+group by col0
+having count(*) > (select all_count from cte2)

Review comment:
       Could we add tests to make sure the optimization is only triggered for SELECT queries? For instance, I am thinking about CTAS and CMV statements, the optimization should not triggered in those cases (I guess it could lead some kind of side effect, at least for CMV).




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org