You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/03/16 13:52:35 UTC
[GitHub] [spark] mcdull-zhang opened a new pull request #35878: [SPARK-38570][SQL]Incorrect DynamicPartitionPruning caused by Literal
mcdull-zhang opened a new pull request #35878:
URL: https://github.com/apache/spark/pull/35878
### What changes were proposed in this pull request?
The return value of Literal.references is an empty AttributeSet, so Literal is mistaken for a partition column.
For example, the sql in the test case will generate such a physical plan when the adaptive is closed:
```text
*(4) Project [type#5352, joinCol#5354, otherCol#5355, score#5362]
+- *(4) BroadcastHashJoin [type#5352, joinCol#5354], [type#5360, joinCol#5361], Inner, BuildRight, false
:- Union
: :- *(1) Project [type1 AS type#5352, joinCol#5354, otherCol#5355]
: : +- *(1) Filter (isnotnull(joinCol#5354) AND dynamicpruningexpression(type1 IN dynamicpruning#5370))
: : : +- ReusedSubquery SubqueryBroadcast dynamicpruning#5370, 0, [type#5360, joinCol#5361], [id=#381]
: : +- *(1) ColumnarToRow
: : +- FileScan parquet default.fact1[joinCol#5354,otherCol#5355,partCol#5356] Batched: true, DataFilters: [isnotnull(joinCol#5354)], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/Users/dongdongzhang/code/study/spark/spark-warehouse/org.apache...., PartitionFilters: [isnotnull(partCol#5356), (partCol#5356 = part1), dynamicpruningexpression(type1 IN dynamicprunin..., PushedFilters: [IsNotNull(joinCol)], ReadSchema: struct<joinCol:int,otherCol:string>
: : +- SubqueryBroadcast dynamicpruning#5370, 0, [type#5360, joinCol#5361], [id=#381]
: : +- BroadcastExchange HashedRelationBroadcastMode(List(input[0, string, false], input[1, int, false]),false), [id=#380]
: : +- *(1) Filter ((((type#5360 <=> type1) OR (type#5360 <=> type2)) AND isnotnull(type#5360)) AND isnotnull(joinCol#5361))
: : +- *(1) ColumnarToRow
: : +- FileScan parquet default.dim[type#5360,joinCol#5361,score#5362] Batched: true, DataFilters: [((type#5360 <=> type1) OR (type#5360 <=> type2)), isnotnull(type#5360), isnotnull(joinCol#5361)], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/Users/dongdongzhang/code/study/spark/spark-warehouse/org.apache...., PartitionFilters: [], PushedFilters: [Or(EqualNullSafe(type,type1),EqualNullSafe(type,type2)), IsNotNull(type), IsNotNull(joinCol)], ReadSchema: struct<type:string,joinCol:int,score:int>
: +- *(2) Project [type2 AS type#5353, joinCol#5357, otherCol#5358]
: +- *(2) Filter (isnotnull(joinCol#5357) AND dynamicpruningexpression(type2 IN dynamicpruning#5370))
: : +- ReusedSubquery SubqueryBroadcast dynamicpruning#5370, 0, [type#5360, joinCol#5361], [id=#381]
: +- *(2) ColumnarToRow
: +- FileScan parquet default.fact2[joinCol#5357,otherCol#5358,partCol#5359] Batched: true, DataFilters: [isnotnull(joinCol#5357)], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/Users/dongdongzhang/code/study/spark/spark-warehouse/org.apache...., PartitionFilters: [isnotnull(partCol#5359), (partCol#5359 = part1), dynamicpruningexpression(type2 IN dynamicprunin..., PushedFilters: [IsNotNull(joinCol)], ReadSchema: struct<joinCol:int,otherCol:string>
: +- ReusedSubquery SubqueryBroadcast dynamicpruning#5370, 0, [type#5360, joinCol#5361], [id=#381]
+- ReusedExchange [type#5360, joinCol#5361, score#5362], BroadcastExchange HashedRelationBroadcastMode(List(input[0, string, false], input[1, int, false]),false), [id=#380]
```
after this pr:
```text
*(4) Project [type#5352, joinCol#5354, otherCol#5355, score#5362]
+- *(4) BroadcastHashJoin [type#5352, joinCol#5354], [type#5360, joinCol#5361], Inner, BuildRight, false
:- Union
: :- *(1) Project [type1 AS type#5352, joinCol#5354, otherCol#5355]
: : +- *(1) Filter isnotnull(joinCol#5354)
: : +- *(1) ColumnarToRow
: : +- FileScan parquet default.fact1[joinCol#5354,otherCol#5355,partCol#5356] Batched: true, DataFilters: [isnotnull(joinCol#5354)], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/Users/dongdongzhang/code/study/spark/spark-warehouse/org.apache...., PartitionFilters: [isnotnull(partCol#5356), (partCol#5356 = part1)], PushedFilters: [IsNotNull(joinCol)], ReadSchema: struct<joinCol:int,otherCol:string>
: +- *(2) Project [type2 AS type#5353, joinCol#5357, otherCol#5358]
: +- *(2) Filter isnotnull(joinCol#5357)
: +- *(2) ColumnarToRow
: +- FileScan parquet default.fact2[joinCol#5357,otherCol#5358,partCol#5359] Batched: true, DataFilters: [isnotnull(joinCol#5357)], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/Users/dongdongzhang/code/study/spark/spark-warehouse/org.apache...., PartitionFilters: [isnotnull(partCol#5359), (partCol#5359 = part1)], PushedFilters: [IsNotNull(joinCol)], ReadSchema: struct<joinCol:int,otherCol:string>
+- BroadcastExchange HashedRelationBroadcastMode(List(input[0, string, false], input[1, int, false]),false), [id=#375]
+- *(3) Filter ((((type#5360 <=> type1) OR (type#5360 <=> type2)) AND isnotnull(type#5360)) AND isnotnull(joinCol#5361))
+- *(3) ColumnarToRow
+- FileScan parquet default.dim[type#5360,joinCol#5361,score#5362] Batched: true, DataFilters: [((type#5360 <=> type1) OR (type#5360 <=> type2)), isnotnull(type#5360), isnotnull(joinCol#5361)], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/Users/dongdongzhang/code/study/spark/spark-warehouse/org.apache...., PartitionFilters: [], PushedFilters: [Or(EqualNullSafe(type,type1),EqualNullSafe(type,type2)), IsNotNull(type), IsNotNull(joinCol)], ReadSchema: struct<type:string,joinCol:int,score:int>
```
### Why are the changes needed?
Execution performance improvement
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Added unit test
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] wangyum commented on a change in pull request #35878: [SPARK-38570][SQL] Incorrect DynamicPartitionPruning caused by Literal
Posted by GitBox <gi...@apache.org>.
wangyum commented on a change in pull request #35878:
URL: https://github.com/apache/spark/pull/35878#discussion_r833942267
##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/dynamicpruning/PartitionPruning.scala
##########
@@ -65,7 +65,7 @@ object PartitionPruning extends Rule[LogicalPlan] with PredicateHelper {
case fs: HadoopFsRelation =>
val partitionColumns = AttributeSet(
l.resolve(fs.partitionSchema, fs.sparkSession.sessionState.analyzer.resolver))
- if (resExp.references.subsetOf(partitionColumns)) {
+ if (!resExp.references.isEmpty && resExp.references.subsetOf(partitionColumns)) {
Review comment:
LGTM.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] mcdull-zhang commented on a change in pull request #35878: [SPARK-38570][SQL] Incorrect DynamicPartitionPruning caused by Literal
Posted by GitBox <gi...@apache.org>.
mcdull-zhang commented on a change in pull request #35878:
URL: https://github.com/apache/spark/pull/35878#discussion_r833881952
##########
File path: sql/core/src/test/scala/org/apache/spark/sql/DynamicPartitionPruningSuite.scala
##########
@@ -1528,6 +1528,55 @@ abstract class DynamicPartitionPruningSuiteBase
}
}
}
+
+ test("SPARK-38570: Fix incorrect DynamicPartitionPruning caused by Literal") {
+ withSQLConf(SQLConf.DYNAMIC_PARTITION_PRUNING_ENABLED.key -> "true") {
+ withTable("fact1", "fact2", "dim") {
Review comment:
So embarrassing, I wanted to use these built-in tables before, but I didn't see table fact_stats.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] wangyum commented on a change in pull request #35878: [SPARK-38570][SQL]Incorrect DynamicPartitionPruning caused by Literal
Posted by GitBox <gi...@apache.org>.
wangyum commented on a change in pull request #35878:
URL: https://github.com/apache/spark/pull/35878#discussion_r828044170
##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/dynamicpruning/PartitionPruning.scala
##########
@@ -65,7 +65,7 @@ object PartitionPruning extends Rule[LogicalPlan] with PredicateHelper {
case fs: HadoopFsRelation =>
val partitionColumns = AttributeSet(
l.resolve(fs.partitionSchema, fs.sparkSession.sessionState.analyzer.resolver))
- if (resExp.references.subsetOf(partitionColumns)) {
+ if (!resExp.references.isEmpty && resExp.references.subsetOf(partitionColumns)) {
Review comment:
`resExp.references.nonEmpty`?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #35878: [SPARK-38570][SQL] Incorrect DynamicPartitionPruning caused by Literal
Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #35878:
URL: https://github.com/apache/spark/pull/35878#issuecomment-1077028627
cc @maryannxue too
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #35878: [SPARK-38570][SQL] Incorrect DynamicPartitionPruning caused by Literal
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #35878:
URL: https://github.com/apache/spark/pull/35878#issuecomment-1070938582
Can one of the admins verify this patch?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] mcdull-zhang closed pull request #35878: [SPARK-38570][SQL] Incorrect DynamicPartitionPruning caused by Literal
Posted by GitBox <gi...@apache.org>.
mcdull-zhang closed pull request #35878:
URL: https://github.com/apache/spark/pull/35878
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] wangyum commented on pull request #35878: [SPARK-38570][SQL] Incorrect DynamicPartitionPruning caused by Literal
Posted by GitBox <gi...@apache.org>.
wangyum commented on pull request #35878:
URL: https://github.com/apache/spark/pull/35878#issuecomment-1078552095
Merged to master and branch-3.3.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] mcdull-zhang commented on pull request #35878: [SPARK-38570][SQL] Incorrect DynamicPartitionPruning caused by Literal
Posted by GitBox <gi...@apache.org>.
mcdull-zhang commented on pull request #35878:
URL: https://github.com/apache/spark/pull/35878#issuecomment-1078626464
@wangyum I have newly submitted a pr for 3.2.
thanks again all.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #35878: [SPARK-38570][SQL] Incorrect DynamicPartitionPruning caused by Literal
Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #35878:
URL: https://github.com/apache/spark/pull/35878#discussion_r833882221
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala
##########
@@ -128,6 +128,9 @@ trait PredicateHelper extends AliasHelper with Logging {
def findExpressionAndTrackLineageDown(
exp: Expression,
plan: LogicalPlan): Option[(Expression, LogicalPlan)] = {
+ if (exp.references.isEmpty) {
+ return None
+ }
Review comment:
no biggie
```suggestion
if (exp.references.isEmpty) return None
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] wangyum closed pull request #35878: [SPARK-38570][SQL] Incorrect DynamicPartitionPruning caused by Literal
Posted by GitBox <gi...@apache.org>.
wangyum closed pull request #35878:
URL: https://github.com/apache/spark/pull/35878
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] mcdull-zhang commented on a change in pull request #35878: [SPARK-38570][SQL] Incorrect DynamicPartitionPruning caused by Literal
Posted by GitBox <gi...@apache.org>.
mcdull-zhang commented on a change in pull request #35878:
URL: https://github.com/apache/spark/pull/35878#discussion_r828774014
##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/dynamicpruning/PartitionPruning.scala
##########
@@ -65,7 +65,7 @@ object PartitionPruning extends Rule[LogicalPlan] with PredicateHelper {
case fs: HadoopFsRelation =>
val partitionColumns = AttributeSet(
l.resolve(fs.partitionSchema, fs.sparkSession.sessionState.analyzer.resolver))
- if (resExp.references.subsetOf(partitionColumns)) {
+ if (!resExp.references.isEmpty && resExp.references.subsetOf(partitionColumns)) {
Review comment:
HadoopFsRelation, HiveTableRelation, DataSourceV2ScanRelation all need this check, and now I filter the literal at the beginning.
Is such code reasonable? Or should I handle it in findExpressionAndTrackLineageDown? Or should it be handled in three case statements?
Thank you for your review
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #35878: [SPARK-38570][SQL] Incorrect DynamicPartitionPruning caused by Literal
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #35878:
URL: https://github.com/apache/spark/pull/35878#issuecomment-1070938582
Can one of the admins verify this patch?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] mcdull-zhang removed a comment on pull request #35878: [SPARK-38570][SQL] Incorrect DynamicPartitionPruning caused by Literal
Posted by GitBox <gi...@apache.org>.
mcdull-zhang removed a comment on pull request #35878:
URL: https://github.com/apache/spark/pull/35878#issuecomment-1075983927
friendly ping @cloud-fan
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] wangyum commented on a change in pull request #35878: [SPARK-38570][SQL]Incorrect DynamicPartitionPruning caused by Literal
Posted by GitBox <gi...@apache.org>.
wangyum commented on a change in pull request #35878:
URL: https://github.com/apache/spark/pull/35878#discussion_r828045310
##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/dynamicpruning/PartitionPruning.scala
##########
@@ -65,7 +65,7 @@ object PartitionPruning extends Rule[LogicalPlan] with PredicateHelper {
case fs: HadoopFsRelation =>
val partitionColumns = AttributeSet(
l.resolve(fs.partitionSchema, fs.sparkSession.sessionState.analyzer.resolver))
- if (resExp.references.subsetOf(partitionColumns)) {
+ if (!resExp.references.isEmpty && resExp.references.subsetOf(partitionColumns)) {
Review comment:
Does the `DataSourceV2ScanRelation` also need this check?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #35878: [SPARK-38570][SQL] Incorrect DynamicPartitionPruning caused by Literal
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #35878:
URL: https://github.com/apache/spark/pull/35878#discussion_r833273864
##########
File path: sql/core/src/test/scala/org/apache/spark/sql/DynamicPartitionPruningSuite.scala
##########
@@ -1528,6 +1528,55 @@ abstract class DynamicPartitionPruningSuiteBase
}
}
}
+
+ test("SPARK-38570: Fix incorrect DynamicPartitionPruning caused by Literal") {
+ withSQLConf(SQLConf.DYNAMIC_PARTITION_PRUNING_ENABLED.key -> "true") {
+ withTable("fact1", "fact2", "dim") {
Review comment:
+1
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] mcdull-zhang commented on pull request #35878: [SPARK-38570][SQL] Incorrect DynamicPartitionPruning caused by Literal
Posted by GitBox <gi...@apache.org>.
mcdull-zhang commented on pull request #35878:
URL: https://github.com/apache/spark/pull/35878#issuecomment-1075983927
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] mcdull-zhang commented on a change in pull request #35878: [SPARK-38570][SQL] Incorrect DynamicPartitionPruning caused by Literal
Posted by GitBox <gi...@apache.org>.
mcdull-zhang commented on a change in pull request #35878:
URL: https://github.com/apache/spark/pull/35878#discussion_r833882518
##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/dynamicpruning/PartitionPruning.scala
##########
@@ -65,7 +65,7 @@ object PartitionPruning extends Rule[LogicalPlan] with PredicateHelper {
case fs: HadoopFsRelation =>
val partitionColumns = AttributeSet(
l.resolve(fs.partitionSchema, fs.sparkSession.sessionState.analyzer.resolver))
- if (resExp.references.subsetOf(partitionColumns)) {
+ if (!resExp.references.isEmpty && resExp.references.subsetOf(partitionColumns)) {
Review comment:
Is this ok now?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #35878: [SPARK-38570][SQL] Incorrect DynamicPartitionPruning caused by Literal
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #35878:
URL: https://github.com/apache/spark/pull/35878#discussion_r833917998
##########
File path: sql/core/src/test/scala/org/apache/spark/sql/DynamicPartitionPruningSuite.scala
##########
@@ -1528,6 +1528,34 @@ abstract class DynamicPartitionPruningSuiteBase
}
}
}
+
+ test("SPARK-38570: Fix incorrect DynamicPartitionPruning caused by Literal") {
+ withSQLConf(SQLConf.DYNAMIC_PARTITION_PRUNING_ENABLED.key -> "true") {
+ val df = sql(
+ """
+ |SELECT f.store_id,
+ | f.date_id,
+ | s.state_province
+ |FROM (SELECT 4 AS store_id,
+ | date_id,
+ | product_id
+ | FROM fact_sk
+ | WHERE date_id >= 1300
+ | UNION ALL
+ | SELECT 5 AS store_id,
+ | date_id,
+ | product_id
+ | FROM fact_stats
+ | WHERE date_id <= 1000) f
+ |JOIN dim_store s
+ |ON f.store_id = s.store_id
+ |WHERE s.country = 'US'
+ |""".stripMargin)
+
+ checkPartitionPruningPredicate(df, false, withBroadcast = false)
Review comment:
```suggestion
checkPartitionPruningPredicate(df, withSubquery = false, withBroadcast = false)
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] wangyum commented on pull request #35878: [SPARK-38570][SQL] Incorrect DynamicPartitionPruning caused by Literal
Posted by GitBox <gi...@apache.org>.
wangyum commented on pull request #35878:
URL: https://github.com/apache/spark/pull/35878#issuecomment-1078554903
@mcdull-zhang Could you backport this to branch-3.2.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #35878: [SPARK-38570][SQL] Incorrect DynamicPartitionPruning caused by Literal
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on pull request #35878:
URL: https://github.com/apache/spark/pull/35878#issuecomment-1077612847
@mcdull-zhang can you re-trigger the test?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] wangyum commented on a change in pull request #35878: [SPARK-38570][SQL] Incorrect DynamicPartitionPruning caused by Literal
Posted by GitBox <gi...@apache.org>.
wangyum commented on a change in pull request #35878:
URL: https://github.com/apache/spark/pull/35878#discussion_r833206959
##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/dynamicpruning/PartitionPruning.scala
##########
@@ -65,7 +65,7 @@ object PartitionPruning extends Rule[LogicalPlan] with PredicateHelper {
case fs: HadoopFsRelation =>
val partitionColumns = AttributeSet(
l.resolve(fs.partitionSchema, fs.sparkSession.sessionState.analyzer.resolver))
- if (resExp.references.subsetOf(partitionColumns)) {
+ if (!resExp.references.isEmpty && resExp.references.subsetOf(partitionColumns)) {
Review comment:
Could we fix `PredicateHelper.findExpressionAndTrackLineageDown`?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] wangyum commented on a change in pull request #35878: [SPARK-38570][SQL] Incorrect DynamicPartitionPruning caused by Literal
Posted by GitBox <gi...@apache.org>.
wangyum commented on a change in pull request #35878:
URL: https://github.com/apache/spark/pull/35878#discussion_r833250853
##########
File path: sql/core/src/test/scala/org/apache/spark/sql/DynamicPartitionPruningSuite.scala
##########
@@ -1528,6 +1528,55 @@ abstract class DynamicPartitionPruningSuiteBase
}
}
}
+
+ test("SPARK-38570: Fix incorrect DynamicPartitionPruning caused by Literal") {
+ withSQLConf(SQLConf.DYNAMIC_PARTITION_PRUNING_ENABLED.key -> "true") {
+ withTable("fact1", "fact2", "dim") {
Review comment:
We already have lots of built-in tables. Cloud we change the test query to:
```sql
SELECT f.store_id,
f.date_id,
s.state_province
FROM (SELECT 4 AS store_id,
date_id,
product_id
FROM fact_sk
WHERE date_id >= 1300
UNION ALL
SELECT 5 AS store_id,
date_id,
product_id
FROM fact_stats
WHERE date_id <= 1000) f
JOIN dim_store s
ON f.store_id = s.store_id
WHERE s.country = 'US'
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org