You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/03/16 13:52:35 UTC

[GitHub] [spark] mcdull-zhang opened a new pull request #35878: [SPARK-38570][SQL]Incorrect DynamicPartitionPruning caused by Literal

mcdull-zhang opened a new pull request #35878:
URL: https://github.com/apache/spark/pull/35878


   ### What changes were proposed in this pull request?
   
   The return value of Literal.references is an empty AttributeSet, so Literal is mistaken for a partition column.
   
   For example, the sql in the test case will generate such a physical plan when the adaptive is closed: 
   ```text
   *(4) Project [type#5352, joinCol#5354, otherCol#5355, score#5362]
   +- *(4) BroadcastHashJoin [type#5352, joinCol#5354], [type#5360, joinCol#5361], Inner, BuildRight, false
      :- Union
      :  :- *(1) Project [type1 AS type#5352, joinCol#5354, otherCol#5355]
      :  :  +- *(1) Filter (isnotnull(joinCol#5354) AND dynamicpruningexpression(type1 IN dynamicpruning#5370))
      :  :     :  +- ReusedSubquery SubqueryBroadcast dynamicpruning#5370, 0, [type#5360, joinCol#5361], [id=#381]
      :  :     +- *(1) ColumnarToRow
      :  :        +- FileScan parquet default.fact1[joinCol#5354,otherCol#5355,partCol#5356] Batched: true, DataFilters: [isnotnull(joinCol#5354)], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/Users/dongdongzhang/code/study/spark/spark-warehouse/org.apache...., PartitionFilters: [isnotnull(partCol#5356), (partCol#5356 = part1), dynamicpruningexpression(type1 IN dynamicprunin..., PushedFilters: [IsNotNull(joinCol)], ReadSchema: struct<joinCol:int,otherCol:string>
      :  :              +- SubqueryBroadcast dynamicpruning#5370, 0, [type#5360, joinCol#5361], [id=#381]
      :  :                 +- BroadcastExchange HashedRelationBroadcastMode(List(input[0, string, false], input[1, int, false]),false), [id=#380]
      :  :                    +- *(1) Filter ((((type#5360 <=> type1) OR (type#5360 <=> type2)) AND isnotnull(type#5360)) AND isnotnull(joinCol#5361))
      :  :                       +- *(1) ColumnarToRow
      :  :                          +- FileScan parquet default.dim[type#5360,joinCol#5361,score#5362] Batched: true, DataFilters: [((type#5360 <=> type1) OR (type#5360 <=> type2)), isnotnull(type#5360), isnotnull(joinCol#5361)], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/Users/dongdongzhang/code/study/spark/spark-warehouse/org.apache...., PartitionFilters: [], PushedFilters: [Or(EqualNullSafe(type,type1),EqualNullSafe(type,type2)), IsNotNull(type), IsNotNull(joinCol)], ReadSchema: struct<type:string,joinCol:int,score:int>
      :  +- *(2) Project [type2 AS type#5353, joinCol#5357, otherCol#5358]
      :     +- *(2) Filter (isnotnull(joinCol#5357) AND dynamicpruningexpression(type2 IN dynamicpruning#5370))
      :        :  +- ReusedSubquery SubqueryBroadcast dynamicpruning#5370, 0, [type#5360, joinCol#5361], [id=#381]
      :        +- *(2) ColumnarToRow
      :           +- FileScan parquet default.fact2[joinCol#5357,otherCol#5358,partCol#5359] Batched: true, DataFilters: [isnotnull(joinCol#5357)], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/Users/dongdongzhang/code/study/spark/spark-warehouse/org.apache...., PartitionFilters: [isnotnull(partCol#5359), (partCol#5359 = part1), dynamicpruningexpression(type2 IN dynamicprunin..., PushedFilters: [IsNotNull(joinCol)], ReadSchema: struct<joinCol:int,otherCol:string>
      :                 +- ReusedSubquery SubqueryBroadcast dynamicpruning#5370, 0, [type#5360, joinCol#5361], [id=#381]
      +- ReusedExchange [type#5360, joinCol#5361, score#5362], BroadcastExchange HashedRelationBroadcastMode(List(input[0, string, false], input[1, int, false]),false), [id=#380]
   ```
   
   after this pr:
   ```text
   *(4) Project [type#5352, joinCol#5354, otherCol#5355, score#5362]
   +- *(4) BroadcastHashJoin [type#5352, joinCol#5354], [type#5360, joinCol#5361], Inner, BuildRight, false
      :- Union
      :  :- *(1) Project [type1 AS type#5352, joinCol#5354, otherCol#5355]
      :  :  +- *(1) Filter isnotnull(joinCol#5354)
      :  :     +- *(1) ColumnarToRow
      :  :        +- FileScan parquet default.fact1[joinCol#5354,otherCol#5355,partCol#5356] Batched: true, DataFilters: [isnotnull(joinCol#5354)], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/Users/dongdongzhang/code/study/spark/spark-warehouse/org.apache...., PartitionFilters: [isnotnull(partCol#5356), (partCol#5356 = part1)], PushedFilters: [IsNotNull(joinCol)], ReadSchema: struct<joinCol:int,otherCol:string>
      :  +- *(2) Project [type2 AS type#5353, joinCol#5357, otherCol#5358]
      :     +- *(2) Filter isnotnull(joinCol#5357)
      :        +- *(2) ColumnarToRow
      :           +- FileScan parquet default.fact2[joinCol#5357,otherCol#5358,partCol#5359] Batched: true, DataFilters: [isnotnull(joinCol#5357)], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/Users/dongdongzhang/code/study/spark/spark-warehouse/org.apache...., PartitionFilters: [isnotnull(partCol#5359), (partCol#5359 = part1)], PushedFilters: [IsNotNull(joinCol)], ReadSchema: struct<joinCol:int,otherCol:string>
      +- BroadcastExchange HashedRelationBroadcastMode(List(input[0, string, false], input[1, int, false]),false), [id=#375]
         +- *(3) Filter ((((type#5360 <=> type1) OR (type#5360 <=> type2)) AND isnotnull(type#5360)) AND isnotnull(joinCol#5361))
            +- *(3) ColumnarToRow
               +- FileScan parquet default.dim[type#5360,joinCol#5361,score#5362] Batched: true, DataFilters: [((type#5360 <=> type1) OR (type#5360 <=> type2)), isnotnull(type#5360), isnotnull(joinCol#5361)], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/Users/dongdongzhang/code/study/spark/spark-warehouse/org.apache...., PartitionFilters: [], PushedFilters: [Or(EqualNullSafe(type,type1),EqualNullSafe(type,type2)), IsNotNull(type), IsNotNull(joinCol)], ReadSchema: struct<type:string,joinCol:int,score:int>
   ```
   
   
   ### Why are the changes needed?
   Execution performance improvement 
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   Added unit test
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] wangyum commented on a change in pull request #35878: [SPARK-38570][SQL] Incorrect DynamicPartitionPruning caused by Literal

Posted by GitBox <gi...@apache.org>.

wangyum commented on a change in pull request #35878:
URL: https://github.com/apache/spark/pull/35878#discussion_r833942267



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/dynamicpruning/PartitionPruning.scala
##########
@@ -65,7 +65,7 @@ object PartitionPruning extends Rule[LogicalPlan] with PredicateHelper {
           case fs: HadoopFsRelation =>
             val partitionColumns = AttributeSet(
               l.resolve(fs.partitionSchema, fs.sparkSession.sessionState.analyzer.resolver))
-            if (resExp.references.subsetOf(partitionColumns)) {
+            if (!resExp.references.isEmpty && resExp.references.subsetOf(partitionColumns)) {

Review comment:
       LGTM.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] mcdull-zhang commented on a change in pull request #35878: [SPARK-38570][SQL] Incorrect DynamicPartitionPruning caused by Literal

Posted by GitBox <gi...@apache.org>.

mcdull-zhang commented on a change in pull request #35878:
URL: https://github.com/apache/spark/pull/35878#discussion_r833881952



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/DynamicPartitionPruningSuite.scala
##########
@@ -1528,6 +1528,55 @@ abstract class DynamicPartitionPruningSuiteBase
       }
     }
   }
+
+  test("SPARK-38570: Fix incorrect DynamicPartitionPruning caused by Literal") {
+    withSQLConf(SQLConf.DYNAMIC_PARTITION_PRUNING_ENABLED.key -> "true") {
+      withTable("fact1", "fact2", "dim") {

Review comment:
       So embarrassing, I wanted to use these built-in tables before, but I didn't see table fact_stats.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] wangyum commented on a change in pull request #35878: [SPARK-38570][SQL]Incorrect DynamicPartitionPruning caused by Literal

Posted by GitBox <gi...@apache.org>.

wangyum commented on a change in pull request #35878:
URL: https://github.com/apache/spark/pull/35878#discussion_r828044170



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/dynamicpruning/PartitionPruning.scala
##########
@@ -65,7 +65,7 @@ object PartitionPruning extends Rule[LogicalPlan] with PredicateHelper {
           case fs: HadoopFsRelation =>
             val partitionColumns = AttributeSet(
               l.resolve(fs.partitionSchema, fs.sparkSession.sessionState.analyzer.resolver))
-            if (resExp.references.subsetOf(partitionColumns)) {
+            if (!resExp.references.isEmpty && resExp.references.subsetOf(partitionColumns)) {

Review comment:
       `resExp.references.nonEmpty`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #35878: [SPARK-38570][SQL] Incorrect DynamicPartitionPruning caused by Literal

Posted by GitBox <gi...@apache.org>.

HyukjinKwon commented on pull request #35878:
URL: https://github.com/apache/spark/pull/35878#issuecomment-1077028627


   cc @maryannxue too


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #35878: [SPARK-38570][SQL] Incorrect DynamicPartitionPruning caused by Literal

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #35878:
URL: https://github.com/apache/spark/pull/35878#issuecomment-1070938582


   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] mcdull-zhang closed pull request #35878: [SPARK-38570][SQL] Incorrect DynamicPartitionPruning caused by Literal

Posted by GitBox <gi...@apache.org>.

mcdull-zhang closed pull request #35878:
URL: https://github.com/apache/spark/pull/35878


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] wangyum commented on pull request #35878: [SPARK-38570][SQL] Incorrect DynamicPartitionPruning caused by Literal

Posted by GitBox <gi...@apache.org>.

wangyum commented on pull request #35878:
URL: https://github.com/apache/spark/pull/35878#issuecomment-1078552095


   Merged to master and branch-3.3.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] mcdull-zhang commented on pull request #35878: [SPARK-38570][SQL] Incorrect DynamicPartitionPruning caused by Literal

Posted by GitBox <gi...@apache.org>.

mcdull-zhang commented on pull request #35878:
URL: https://github.com/apache/spark/pull/35878#issuecomment-1078626464


   @wangyum I have newly submitted a pr for 3.2. 
   
   thanks again all.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #35878: [SPARK-38570][SQL] Incorrect DynamicPartitionPruning caused by Literal

Posted by GitBox <gi...@apache.org>.

HyukjinKwon commented on a change in pull request #35878:
URL: https://github.com/apache/spark/pull/35878#discussion_r833882221



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala
##########
@@ -128,6 +128,9 @@ trait PredicateHelper extends AliasHelper with Logging {
   def findExpressionAndTrackLineageDown(
       exp: Expression,
       plan: LogicalPlan): Option[(Expression, LogicalPlan)] = {
+    if (exp.references.isEmpty) {
+      return None
+    }

Review comment:
       no biggie
   
   ```suggestion
       if (exp.references.isEmpty) return None
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] wangyum closed pull request #35878: [SPARK-38570][SQL] Incorrect DynamicPartitionPruning caused by Literal

Posted by GitBox <gi...@apache.org>.

wangyum closed pull request #35878:
URL: https://github.com/apache/spark/pull/35878


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] mcdull-zhang commented on a change in pull request #35878: [SPARK-38570][SQL] Incorrect DynamicPartitionPruning caused by Literal

Posted by GitBox <gi...@apache.org>.

mcdull-zhang commented on a change in pull request #35878:
URL: https://github.com/apache/spark/pull/35878#discussion_r828774014



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/dynamicpruning/PartitionPruning.scala
##########
@@ -65,7 +65,7 @@ object PartitionPruning extends Rule[LogicalPlan] with PredicateHelper {
           case fs: HadoopFsRelation =>
             val partitionColumns = AttributeSet(
               l.resolve(fs.partitionSchema, fs.sparkSession.sessionState.analyzer.resolver))
-            if (resExp.references.subsetOf(partitionColumns)) {
+            if (!resExp.references.isEmpty && resExp.references.subsetOf(partitionColumns)) {

Review comment:
       HadoopFsRelation, HiveTableRelation, DataSourceV2ScanRelation all need this check, and now I filter the literal at the beginning.
   
   Is such code reasonable? Or should I handle it in findExpressionAndTrackLineageDown? Or should it be handled in three case statements?
   
   Thank you for your review 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #35878: [SPARK-38570][SQL] Incorrect DynamicPartitionPruning caused by Literal

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #35878:
URL: https://github.com/apache/spark/pull/35878#issuecomment-1070938582


   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] mcdull-zhang removed a comment on pull request #35878: [SPARK-38570][SQL] Incorrect DynamicPartitionPruning caused by Literal

Posted by GitBox <gi...@apache.org>.

mcdull-zhang removed a comment on pull request #35878:
URL: https://github.com/apache/spark/pull/35878#issuecomment-1075983927


   friendly ping @cloud-fan 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] wangyum commented on a change in pull request #35878: [SPARK-38570][SQL]Incorrect DynamicPartitionPruning caused by Literal

Posted by GitBox <gi...@apache.org>.

wangyum commented on a change in pull request #35878:
URL: https://github.com/apache/spark/pull/35878#discussion_r828045310



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/dynamicpruning/PartitionPruning.scala
##########
@@ -65,7 +65,7 @@ object PartitionPruning extends Rule[LogicalPlan] with PredicateHelper {
           case fs: HadoopFsRelation =>
             val partitionColumns = AttributeSet(
               l.resolve(fs.partitionSchema, fs.sparkSession.sessionState.analyzer.resolver))
-            if (resExp.references.subsetOf(partitionColumns)) {
+            if (!resExp.references.isEmpty && resExp.references.subsetOf(partitionColumns)) {

Review comment:
       Does the `DataSourceV2ScanRelation` also need this check?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #35878: [SPARK-38570][SQL] Incorrect DynamicPartitionPruning caused by Literal

Posted by GitBox <gi...@apache.org>.

cloud-fan commented on a change in pull request #35878:
URL: https://github.com/apache/spark/pull/35878#discussion_r833273864



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/DynamicPartitionPruningSuite.scala
##########
@@ -1528,6 +1528,55 @@ abstract class DynamicPartitionPruningSuiteBase
       }
     }
   }
+
+  test("SPARK-38570: Fix incorrect DynamicPartitionPruning caused by Literal") {
+    withSQLConf(SQLConf.DYNAMIC_PARTITION_PRUNING_ENABLED.key -> "true") {
+      withTable("fact1", "fact2", "dim") {

Review comment:
       +1




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] mcdull-zhang commented on pull request #35878: [SPARK-38570][SQL] Incorrect DynamicPartitionPruning caused by Literal

Posted by GitBox <gi...@apache.org>.

mcdull-zhang commented on pull request #35878:
URL: https://github.com/apache/spark/pull/35878#issuecomment-1075983927






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] mcdull-zhang commented on a change in pull request #35878: [SPARK-38570][SQL] Incorrect DynamicPartitionPruning caused by Literal

Posted by GitBox <gi...@apache.org>.

mcdull-zhang commented on a change in pull request #35878:
URL: https://github.com/apache/spark/pull/35878#discussion_r833882518



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/dynamicpruning/PartitionPruning.scala
##########
@@ -65,7 +65,7 @@ object PartitionPruning extends Rule[LogicalPlan] with PredicateHelper {
           case fs: HadoopFsRelation =>
             val partitionColumns = AttributeSet(
               l.resolve(fs.partitionSchema, fs.sparkSession.sessionState.analyzer.resolver))
-            if (resExp.references.subsetOf(partitionColumns)) {
+            if (!resExp.references.isEmpty && resExp.references.subsetOf(partitionColumns)) {

Review comment:
       Is this ok now?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #35878: [SPARK-38570][SQL] Incorrect DynamicPartitionPruning caused by Literal

Posted by GitBox <gi...@apache.org>.

cloud-fan commented on a change in pull request #35878:
URL: https://github.com/apache/spark/pull/35878#discussion_r833917998



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/DynamicPartitionPruningSuite.scala
##########
@@ -1528,6 +1528,34 @@ abstract class DynamicPartitionPruningSuiteBase
       }
     }
   }
+
+  test("SPARK-38570: Fix incorrect DynamicPartitionPruning caused by Literal") {
+    withSQLConf(SQLConf.DYNAMIC_PARTITION_PRUNING_ENABLED.key -> "true") {
+      val df = sql(
+        """
+          |SELECT f.store_id,
+          |       f.date_id,
+          |       s.state_province
+          |FROM (SELECT 4 AS store_id,
+          |               date_id,
+          |               product_id
+          |      FROM   fact_sk
+          |      WHERE  date_id >= 1300
+          |      UNION ALL
+          |      SELECT 5 AS store_id,
+          |               date_id,
+          |               product_id
+          |      FROM   fact_stats
+          |      WHERE  date_id <= 1000) f
+          |JOIN dim_store s
+          |ON f.store_id = s.store_id
+          |WHERE s.country = 'US'
+          |""".stripMargin)
+
+      checkPartitionPruningPredicate(df, false, withBroadcast = false)

Review comment:
       ```suggestion
         checkPartitionPruningPredicate(df, withSubquery = false, withBroadcast = false)
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] wangyum commented on pull request #35878: [SPARK-38570][SQL] Incorrect DynamicPartitionPruning caused by Literal

Posted by GitBox <gi...@apache.org>.

wangyum commented on pull request #35878:
URL: https://github.com/apache/spark/pull/35878#issuecomment-1078554903


   @mcdull-zhang Could you backport this to branch-3.2.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on pull request #35878: [SPARK-38570][SQL] Incorrect DynamicPartitionPruning caused by Literal

Posted by GitBox <gi...@apache.org>.

cloud-fan commented on pull request #35878:
URL: https://github.com/apache/spark/pull/35878#issuecomment-1077612847


   @mcdull-zhang can you re-trigger the test?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] wangyum commented on a change in pull request #35878: [SPARK-38570][SQL] Incorrect DynamicPartitionPruning caused by Literal

Posted by GitBox <gi...@apache.org>.

wangyum commented on a change in pull request #35878:
URL: https://github.com/apache/spark/pull/35878#discussion_r833206959



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/dynamicpruning/PartitionPruning.scala
##########
@@ -65,7 +65,7 @@ object PartitionPruning extends Rule[LogicalPlan] with PredicateHelper {
           case fs: HadoopFsRelation =>
             val partitionColumns = AttributeSet(
               l.resolve(fs.partitionSchema, fs.sparkSession.sessionState.analyzer.resolver))
-            if (resExp.references.subsetOf(partitionColumns)) {
+            if (!resExp.references.isEmpty && resExp.references.subsetOf(partitionColumns)) {

Review comment:
       Could we fix `PredicateHelper.findExpressionAndTrackLineageDown`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] wangyum commented on a change in pull request #35878: [SPARK-38570][SQL] Incorrect DynamicPartitionPruning caused by Literal

Posted by GitBox <gi...@apache.org>.

wangyum commented on a change in pull request #35878:
URL: https://github.com/apache/spark/pull/35878#discussion_r833250853



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/DynamicPartitionPruningSuite.scala
##########
@@ -1528,6 +1528,55 @@ abstract class DynamicPartitionPruningSuiteBase
       }
     }
   }
+
+  test("SPARK-38570: Fix incorrect DynamicPartitionPruning caused by Literal") {
+    withSQLConf(SQLConf.DYNAMIC_PARTITION_PRUNING_ENABLED.key -> "true") {
+      withTable("fact1", "fact2", "dim") {

Review comment:
       We already have lots of built-in tables. Cloud we change the test query to:
   ```sql
   SELECT f.store_id,
          f.date_id,
          s.state_province
   FROM   (SELECT 4 AS store_id,
                  date_id,
                  product_id
           FROM   fact_sk
           WHERE  date_id >= 1300
           UNION ALL
           SELECT 5 AS store_id,
                  date_id,
                  product_id
           FROM   fact_stats
           WHERE  date_id <= 1000) f
          JOIN dim_store s
            ON f.store_id = s.store_id
   WHERE  s.country = 'US'
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org