You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by do...@apache.org on 2022/06/14 07:43:38 UTC
[spark] branch master updated: [SPARK-39448][SQL] Add `ReplaceCTERefWithRepartition` into `nonExcludableRules` list

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
     new 0b785b3c773 [SPARK-39448][SQL] Add `ReplaceCTERefWithRepartition` into `nonExcludableRules` list
0b785b3c773 is described below

commit 0b785b3c77374fa7736f01bb55e87c796985ae14
Author: Yuming Wang <yu...@ebay.com>
AuthorDate: Tue Jun 14 00:43:20 2022 -0700

    [SPARK-39448][SQL] Add `ReplaceCTERefWithRepartition` into `nonExcludableRules` list
    
    ### What changes were proposed in this pull request?
    
    This PR adds `ReplaceCTERefWithRepartition` into nonExcludableRules list.
    
    ### Why are the changes needed?
    
    It will throw exception if user `set spark.sql.optimizer.excludedRules=org.apache.spark.sql.catalyst.optimizer.ReplaceCTERefWithRepartition` before running this query:
    ```sql
    SELECT
      (SELECT avg(id) FROM range(10)),
      (SELECT sum(id) FROM range(10)),
      (SELECT count(distinct id) FROM range(10))
    ```
    Exception:
    ```
    Caused by: java.lang.AssertionError: assertion failed: No plan for WithCTE
    :- CTERelationDef 0, true
    :  +- Project [named_struct(min(id), min(id)#223L, sum(id), sum(id)#226L, count(DISTINCT id), count(DISTINCT id)#229L) AS mergedValue#240]
    :     +- Aggregate [min(id#221L) AS min(id)#223L, sum(id#221L) AS sum(id)#226L, count(distinct id#221L) AS count(DISTINCT id)#229L]
    :        +- Range (0, 10, step=1, splits=None)
    +- Project [scalar-subquery#218 [].min(id) AS scalarsubquery()#230L, scalar-subquery#219 [].sum(id) AS scalarsubquery()#231L, scalar-subquery#220 [].count(DISTINCT id) AS scalarsubquery()#232L]
       :  :- CTERelationRef 0, true, [mergedValue#240]
       :  :- CTERelationRef 0, true, [mergedValue#240]
       :  +- CTERelationRef 0, true, [mergedValue#240]
       +- OneRowRelation
    ```
    
    ### Does this PR introduce _any_ user-facing change?
    
    No.
    
    ### How was this patch tested?
    
    Unit test.
    
    Closes #36847 from wangyum/SPARK-39448.
    
    Authored-by: Yuming Wang <yu...@ebay.com>
    Signed-off-by: Dongjoon Hyun <do...@apache.org>
---
 .../apache/spark/sql/execution/SparkOptimizer.scala |  3 ++-
 .../sql-tests/inputs/non-excludable-rule.sql        |  6 ++++++
 .../sql-tests/results/non-excludable-rule.sql.out   | 21 +++++++++++++++++++++
 3 files changed, 29 insertions(+), 1 deletion(-)

diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkOptimizer.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkOptimizer.scala
index 0e7455009c5..056c16affc2 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkOptimizer.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkOptimizer.scala
@@ -87,7 +87,8 @@ class SparkOptimizer(
     GroupBasedRowLevelOperationScanPlanning.ruleName :+
     V2ScanRelationPushDown.ruleName :+
     V2ScanPartitioning.ruleName :+
-    V2Writes.ruleName
+    V2Writes.ruleName :+
+    ReplaceCTERefWithRepartition.ruleName
 
   /**
    * Optimization batches that are executed before the regular optimization batches (also before
diff --git a/sql/core/src/test/resources/sql-tests/inputs/non-excludable-rule.sql b/sql/core/src/test/resources/sql-tests/inputs/non-excludable-rule.sql
new file mode 100644
index 00000000000..b238d199cc1
--- /dev/null
+++ b/sql/core/src/test/resources/sql-tests/inputs/non-excludable-rule.sql
@@ -0,0 +1,6 @@
+-- SPARK-39448
+SET spark.sql.optimizer.excludedRules=org.apache.spark.sql.catalyst.optimizer.ReplaceCTERefWithRepartition;
+SELECT
+  (SELECT min(id) FROM range(10)),
+  (SELECT sum(id) FROM range(10)),
+  (SELECT count(distinct id) FROM range(10));
diff --git a/sql/core/src/test/resources/sql-tests/results/non-excludable-rule.sql.out b/sql/core/src/test/resources/sql-tests/results/non-excludable-rule.sql.out
new file mode 100644
index 00000000000..c7fa2f04152
--- /dev/null
+++ b/sql/core/src/test/resources/sql-tests/results/non-excludable-rule.sql.out
@@ -0,0 +1,21 @@
+-- Automatically generated by SQLQueryTestSuite
+-- Number of queries: 2
+
+
+-- !query
+SET spark.sql.optimizer.excludedRules=org.apache.spark.sql.catalyst.optimizer.ReplaceCTERefWithRepartition
+-- !query schema
+struct<key:string,value:string>
+-- !query output
+spark.sql.optimizer.excludedRules	org.apache.spark.sql.catalyst.optimizer.ReplaceCTERefWithRepartition
+
+
+-- !query
+SELECT
+  (SELECT min(id) FROM range(10)),
+  (SELECT sum(id) FROM range(10)),
+  (SELECT count(distinct id) FROM range(10))
+-- !query schema
+struct<scalarsubquery():bigint,scalarsubquery():bigint,scalarsubquery():bigint>
+-- !query output
+0	45	10


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org