You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/06/11 13:39:01 UTC

[GitHub] [spark] wangyum opened a new pull request, #36847: [SPARK-39448][SQL] Add ReplaceCTERefWithRepartition into nonExcludableRules list

wangyum opened a new pull request, #36847:
URL: https://github.com/apache/spark/pull/36847

   ### What changes were proposed in this pull request?
   
   This PR adds `ReplaceCTERefWithRepartition` into nonExcludableRules list.
   
   ### Why are the changes needed?
   
   It will throw exception if user `set spark.sql.optimizer.excludedRules=org.apache.spark.sql.catalyst.optimizer.ReplaceCTERefWithRepartition` before running this query:
   ```sql
   SELECT
     (SELECT avg(id) FROM range(10)),
     (SELECT sum(id) FROM range(10)),
     (SELECT count(distinct id) FROM range(10))
   ```
   Exception:
   ```
   Caused by: java.lang.AssertionError: assertion failed: No plan for WithCTE
   :- CTERelationDef 0, true
   :  +- Project [named_struct(min(id), min(id)#223L, sum(id), sum(id)#226L, count(DISTINCT id), count(DISTINCT id)#229L) AS mergedValue#240]
   :     +- Aggregate [min(id#221L) AS min(id)#223L, sum(id#221L) AS sum(id)#226L, count(distinct id#221L) AS count(DISTINCT id)#229L]
   :        +- Range (0, 10, step=1, splits=None)
   +- Project [scalar-subquery#218 [].min(id) AS scalarsubquery()#230L, scalar-subquery#219 [].sum(id) AS scalarsubquery()#231L, scalar-subquery#220 [].count(DISTINCT id) AS scalarsubquery()#232L]
      :  :- CTERelationRef 0, true, [mergedValue#240]
      :  :- CTERelationRef 0, true, [mergedValue#240]
      :  +- CTERelationRef 0, true, [mergedValue#240]
      +- OneRowRelation
   ```
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   Unit test.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] wangyum commented on a diff in pull request #36847: [SPARK-39448][SQL] Add `ReplaceCTERefWithRepartition` into `nonExcludableRules` list

Posted by GitBox <gi...@apache.org>.
wangyum commented on code in PR #36847:
URL: https://github.com/apache/spark/pull/36847#discussion_r895634771


##########
sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala:
##########
@@ -4456,6 +4456,20 @@ class SQLQuerySuite extends QueryTest with SharedSparkSession with AdaptiveSpark
         """.stripMargin),
       Seq(Row(2), Row(1)))
   }
+
+  test("SPARK-39448: Add ReplaceCTERefWithRepartition into nonExcludableRules list") {

Review Comment:
   I added a file:
   https://github.com/apache/spark/blob/1ac77bb9ee496e8cdea67415bdb7cf839b9a4ac5/sql/core/src/test/resources/sql-tests/inputs/non-excludable-rule.sql



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] wangyum commented on a diff in pull request #36847: [SPARK-39448][SQL] Add ReplaceCTERefWithRepartition into nonExcludableRules list

Posted by GitBox <gi...@apache.org>.
wangyum commented on code in PR #36847:
URL: https://github.com/apache/spark/pull/36847#discussion_r895254709


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala:
##########
@@ -270,7 +270,8 @@ abstract class Optimizer(catalogManager: CatalogManager)
       RewritePredicateSubquery.ruleName ::
       NormalizeFloatingNumbers.ruleName ::
       ReplaceUpdateFieldsExpression.ruleName ::
-      RewriteLateralSubquery.ruleName :: Nil
+      RewriteLateralSubquery.ruleName ::
+      ReplaceCTERefWithRepartition.ruleName :: Nil

Review Comment:
   +1



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun closed pull request #36847: [SPARK-39448][SQL] Add `ReplaceCTERefWithRepartition` into `nonExcludableRules` list

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun closed pull request #36847: [SPARK-39448][SQL] Add `ReplaceCTERefWithRepartition` into `nonExcludableRules` list
URL: https://github.com/apache/spark/pull/36847


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #36847: [SPARK-39448][SQL] Add `ReplaceCTERefWithRepartition` into `nonExcludableRules` list

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on PR #36847:
URL: https://github.com/apache/spark/pull/36847#issuecomment-1153369445

   I checked Apache Spark 3.3.0 RC6 and added `3.3.0` to the Affected Version of the JIRA, @wangyum .
   ```
   scala> spark.version
   val res0: String = 3.3.0
   
   scala> sql("set spark.sql.optimizer.excludedRules=org.apache.spark.sql.catalyst.optimizer.ReplaceCTERefWithRepartition")
   val res1: org.apache.spark.sql.DataFrame = [key: string, value: string]
   
   scala> sql("SELECT (SELECT avg(id) FROM range(10)),(SELECT sum(id) FROM range(10)),(SELECT count(distinct id) FROM range(10))").show
   warning: 1 deprecation (since 2.13.3); for details, enable `:setting -deprecation` or `:replay -deprecation`
   org.apache.spark.SparkException: The Spark SQL phase planning failed with an internal error. Please, fill a bug report in, and provide the full stack trace.
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #36847: [SPARK-39448][SQL] Add `ReplaceCTERefWithRepartition` into `nonExcludableRules` list

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on code in PR #36847:
URL: https://github.com/apache/spark/pull/36847#discussion_r895266012


##########
sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala:
##########
@@ -4456,6 +4456,20 @@ class SQLQuerySuite extends QueryTest with SharedSparkSession with AdaptiveSpark
         """.stripMargin),
       Seq(Row(2), Row(1)))
   }
+
+  test("SPARK-39448: Add ReplaceCTERefWithRepartition into nonExcludableRules list") {
+    withSQLConf(
+      SQLConf.OPTIMIZER_EXCLUDED_RULES.key -> ReplaceCTERefWithRepartition.ruleName) {

Review Comment:
   I guess we can add `cte-xxx` like https://github.com/apache/spark/blob/master/sql/core/src/test/resources/sql-tests/inputs/cte.sql 
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #36847: [SPARK-39448][SQL] Add `ReplaceCTERefWithRepartition` into `nonExcludableRules` list

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on code in PR #36847:
URL: https://github.com/apache/spark/pull/36847#discussion_r895266088


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkOptimizer.scala:
##########
@@ -87,7 +87,8 @@ class SparkOptimizer(
     GroupBasedRowLevelOperationScanPlanning.ruleName :+
     V2ScanRelationPushDown.ruleName :+
     V2ScanPartitioning.ruleName :+
-    V2Writes.ruleName
+    V2Writes.ruleName :+
+    ReplaceCTERefWithRepartition.ruleName

Review Comment:
   +1 for this addition. Thanks!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a diff in pull request #36847: [SPARK-39448][SQL] Add ReplaceCTERefWithRepartition into nonExcludableRules list

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on code in PR #36847:
URL: https://github.com/apache/spark/pull/36847#discussion_r895254287


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala:
##########
@@ -270,7 +270,8 @@ abstract class Optimizer(catalogManager: CatalogManager)
       RewritePredicateSubquery.ruleName ::
       NormalizeFloatingNumbers.ruleName ::
       ReplaceUpdateFieldsExpression.ruleName ::
-      RewriteLateralSubquery.ruleName :: Nil
+      RewriteLateralSubquery.ruleName ::
+      ReplaceCTERefWithRepartition.ruleName :: Nil

Review Comment:
   Should we add this to `SparkOptimizer.nonExcludableRules` since the rule is there?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #36847: [SPARK-39448][SQL] Add `ReplaceCTERefWithRepartition` into `nonExcludableRules` list

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on code in PR #36847:
URL: https://github.com/apache/spark/pull/36847#discussion_r895265570


##########
sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala:
##########
@@ -4456,6 +4456,20 @@ class SQLQuerySuite extends QueryTest with SharedSparkSession with AdaptiveSpark
         """.stripMargin),
       Seq(Row(2), Row(1)))
   }
+
+  test("SPARK-39448: Add ReplaceCTERefWithRepartition into nonExcludableRules list") {

Review Comment:
   A question. Is it inevitable for us to have this?
   
   As you know, we recommend `SQLQueryTestSuite` over `SQLQuerySuite` in these days, @wangyum ?
   



##########
sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala:
##########
@@ -4456,6 +4456,20 @@ class SQLQuerySuite extends QueryTest with SharedSparkSession with AdaptiveSpark
         """.stripMargin),
       Seq(Row(2), Row(1)))
   }
+
+  test("SPARK-39448: Add ReplaceCTERefWithRepartition into nonExcludableRules list") {

Review Comment:
   A question. Is it inevitable for us to have this here?
   
   As you know, we recommend `SQLQueryTestSuite` over `SQLQuerySuite` in these days, @wangyum ?
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org