You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2019/07/01 09:29:20 UTC

[GitHub] [spark] liupc opened a new pull request #25020: [SPARK-28220]Fix foldable join condition not pushed down when parent filter is wholly pushed down

liupc opened a new pull request #25020: [SPARK-28220]Fix foldable join condition not pushed down when parent filter is wholly pushed down
URL: https://github.com/apache/spark/pull/25020
 
 
   ## What changes were proposed in this pull request?
   
   Optimizer rule `PushPredicateThroughJoin` will try to push parent filter down though the join, however, when the parent filter is wholly pushed down through the join, the join will become the top node, and then the `transform` method will skip the join to apply the rule. 
   
   Suppose we have two tables: table1 and table2:
   
   ```
   table1: (a: string, b: string, c: string)
   
   table2: (d: string)
   ```
   
   sql as:
   
   `select * from table1 left join (select d, 'w1' as r from table2) on a = d and r = 'w2' where b = 2`
    
   
   let's focus on the following optimizer rules:
   
   ```
   PushPredicateThroughJoin
   
   FodablePropagation
   
   BooleanSimplification
   
   PruneFilters
   ```
   
    
   
   In the above case, on the first iteration of these rules:
   
   PushPredicateThroughJoin -> 
   `
   select * from table1 where b=2 left join (select d, 'w1' as r from table2) on a = d and r = 'w2'`
   FodablePropagation ->
   
   `select * from table1 where b=2 left join (select d, 'w1' as r from table2) on a = d and 'w1' = 'w2'`
   BooleanSimplification ->
   
   `select * from table1 where b=2 left join (select d, 'w1' as r from table2) on false`
   PruneFilters -> No effective
   
    
   
   After several iteration of these rules, the join condition will still never be pushed to the 
   
   right hand of the left join. thus, in some case(e.g. Large right table), the `BroadcastNestedLoopJoin` may be slow or oom.
   
   This PR will fix this problem!
   
   ## How was this patch tested?
   
   exist UT
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org