You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/06/11 13:39:01 UTC
[GitHub] [spark] wangyum opened a new pull request, #36847: [SPARK-39448][SQL] Add ReplaceCTERefWithRepartition into nonExcludableRules list
wangyum opened a new pull request, #36847:
URL: https://github.com/apache/spark/pull/36847
### What changes were proposed in this pull request?
This PR adds `ReplaceCTERefWithRepartition` into nonExcludableRules list.
### Why are the changes needed?
It will throw exception if user `set spark.sql.optimizer.excludedRules=org.apache.spark.sql.catalyst.optimizer.ReplaceCTERefWithRepartition` before running this query:
```sql
SELECT
(SELECT avg(id) FROM range(10)),
(SELECT sum(id) FROM range(10)),
(SELECT count(distinct id) FROM range(10))
```
Exception:
```
Caused by: java.lang.AssertionError: assertion failed: No plan for WithCTE
:- CTERelationDef 0, true
: +- Project [named_struct(min(id), min(id)#223L, sum(id), sum(id)#226L, count(DISTINCT id), count(DISTINCT id)#229L) AS mergedValue#240]
: +- Aggregate [min(id#221L) AS min(id)#223L, sum(id#221L) AS sum(id)#226L, count(distinct id#221L) AS count(DISTINCT id)#229L]
: +- Range (0, 10, step=1, splits=None)
+- Project [scalar-subquery#218 [].min(id) AS scalarsubquery()#230L, scalar-subquery#219 [].sum(id) AS scalarsubquery()#231L, scalar-subquery#220 [].count(DISTINCT id) AS scalarsubquery()#232L]
: :- CTERelationRef 0, true, [mergedValue#240]
: :- CTERelationRef 0, true, [mergedValue#240]
: +- CTERelationRef 0, true, [mergedValue#240]
+- OneRowRelation
```
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Unit test.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] wangyum commented on a diff in pull request #36847: [SPARK-39448][SQL] Add `ReplaceCTERefWithRepartition` into `nonExcludableRules` list
Posted by GitBox <gi...@apache.org>.
wangyum commented on code in PR #36847:
URL: https://github.com/apache/spark/pull/36847#discussion_r895634771
##########
sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala:
##########
@@ -4456,6 +4456,20 @@ class SQLQuerySuite extends QueryTest with SharedSparkSession with AdaptiveSpark
""".stripMargin),
Seq(Row(2), Row(1)))
}
+
+ test("SPARK-39448: Add ReplaceCTERefWithRepartition into nonExcludableRules list") {
Review Comment:
I added a file:
https://github.com/apache/spark/blob/1ac77bb9ee496e8cdea67415bdb7cf839b9a4ac5/sql/core/src/test/resources/sql-tests/inputs/non-excludable-rule.sql
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] wangyum commented on a diff in pull request #36847: [SPARK-39448][SQL] Add ReplaceCTERefWithRepartition into nonExcludableRules list
Posted by GitBox <gi...@apache.org>.
wangyum commented on code in PR #36847:
URL: https://github.com/apache/spark/pull/36847#discussion_r895254709
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala:
##########
@@ -270,7 +270,8 @@ abstract class Optimizer(catalogManager: CatalogManager)
RewritePredicateSubquery.ruleName ::
NormalizeFloatingNumbers.ruleName ::
ReplaceUpdateFieldsExpression.ruleName ::
- RewriteLateralSubquery.ruleName :: Nil
+ RewriteLateralSubquery.ruleName ::
+ ReplaceCTERefWithRepartition.ruleName :: Nil
Review Comment:
+1
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] dongjoon-hyun closed pull request #36847: [SPARK-39448][SQL] Add `ReplaceCTERefWithRepartition` into `nonExcludableRules` list
Posted by GitBox <gi...@apache.org>.
dongjoon-hyun closed pull request #36847: [SPARK-39448][SQL] Add `ReplaceCTERefWithRepartition` into `nonExcludableRules` list
URL: https://github.com/apache/spark/pull/36847
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #36847: [SPARK-39448][SQL] Add `ReplaceCTERefWithRepartition` into `nonExcludableRules` list
Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on PR #36847:
URL: https://github.com/apache/spark/pull/36847#issuecomment-1153369445
I checked Apache Spark 3.3.0 RC6 and added `3.3.0` to the Affected Version of the JIRA, @wangyum .
```
scala> spark.version
val res0: String = 3.3.0
scala> sql("set spark.sql.optimizer.excludedRules=org.apache.spark.sql.catalyst.optimizer.ReplaceCTERefWithRepartition")
val res1: org.apache.spark.sql.DataFrame = [key: string, value: string]
scala> sql("SELECT (SELECT avg(id) FROM range(10)),(SELECT sum(id) FROM range(10)),(SELECT count(distinct id) FROM range(10))").show
warning: 1 deprecation (since 2.13.3); for details, enable `:setting -deprecation` or `:replay -deprecation`
org.apache.spark.SparkException: The Spark SQL phase planning failed with an internal error. Please, fill a bug report in, and provide the full stack trace.
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #36847: [SPARK-39448][SQL] Add `ReplaceCTERefWithRepartition` into `nonExcludableRules` list
Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on code in PR #36847:
URL: https://github.com/apache/spark/pull/36847#discussion_r895266012
##########
sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala:
##########
@@ -4456,6 +4456,20 @@ class SQLQuerySuite extends QueryTest with SharedSparkSession with AdaptiveSpark
""".stripMargin),
Seq(Row(2), Row(1)))
}
+
+ test("SPARK-39448: Add ReplaceCTERefWithRepartition into nonExcludableRules list") {
+ withSQLConf(
+ SQLConf.OPTIMIZER_EXCLUDED_RULES.key -> ReplaceCTERefWithRepartition.ruleName) {
Review Comment:
I guess we can add `cte-xxx` like https://github.com/apache/spark/blob/master/sql/core/src/test/resources/sql-tests/inputs/cte.sql
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #36847: [SPARK-39448][SQL] Add `ReplaceCTERefWithRepartition` into `nonExcludableRules` list
Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on code in PR #36847:
URL: https://github.com/apache/spark/pull/36847#discussion_r895266088
##########
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkOptimizer.scala:
##########
@@ -87,7 +87,8 @@ class SparkOptimizer(
GroupBasedRowLevelOperationScanPlanning.ruleName :+
V2ScanRelationPushDown.ruleName :+
V2ScanPartitioning.ruleName :+
- V2Writes.ruleName
+ V2Writes.ruleName :+
+ ReplaceCTERefWithRepartition.ruleName
Review Comment:
+1 for this addition. Thanks!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a diff in pull request #36847: [SPARK-39448][SQL] Add ReplaceCTERefWithRepartition into nonExcludableRules list
Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on code in PR #36847:
URL: https://github.com/apache/spark/pull/36847#discussion_r895254287
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala:
##########
@@ -270,7 +270,8 @@ abstract class Optimizer(catalogManager: CatalogManager)
RewritePredicateSubquery.ruleName ::
NormalizeFloatingNumbers.ruleName ::
ReplaceUpdateFieldsExpression.ruleName ::
- RewriteLateralSubquery.ruleName :: Nil
+ RewriteLateralSubquery.ruleName ::
+ ReplaceCTERefWithRepartition.ruleName :: Nil
Review Comment:
Should we add this to `SparkOptimizer.nonExcludableRules` since the rule is there?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #36847: [SPARK-39448][SQL] Add `ReplaceCTERefWithRepartition` into `nonExcludableRules` list
Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on code in PR #36847:
URL: https://github.com/apache/spark/pull/36847#discussion_r895265570
##########
sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala:
##########
@@ -4456,6 +4456,20 @@ class SQLQuerySuite extends QueryTest with SharedSparkSession with AdaptiveSpark
""".stripMargin),
Seq(Row(2), Row(1)))
}
+
+ test("SPARK-39448: Add ReplaceCTERefWithRepartition into nonExcludableRules list") {
Review Comment:
A question. Is it inevitable for us to have this?
As you know, we recommend `SQLQueryTestSuite` over `SQLQuerySuite` in these days, @wangyum ?
##########
sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala:
##########
@@ -4456,6 +4456,20 @@ class SQLQuerySuite extends QueryTest with SharedSparkSession with AdaptiveSpark
""".stripMargin),
Seq(Row(2), Row(1)))
}
+
+ test("SPARK-39448: Add ReplaceCTERefWithRepartition into nonExcludableRules list") {
Review Comment:
A question. Is it inevitable for us to have this here?
As you know, we recommend `SQLQueryTestSuite` over `SQLQuerySuite` in these days, @wangyum ?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org