You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2019/07/01 09:29:20 UTC
[GitHub] [spark] liupc opened a new pull request #25020: [SPARK-28220]Fix
foldable join condition not pushed down when parent filter is wholly pushed
down
liupc opened a new pull request #25020: [SPARK-28220]Fix foldable join condition not pushed down when parent filter is wholly pushed down
URL: https://github.com/apache/spark/pull/25020
## What changes were proposed in this pull request?
Optimizer rule `PushPredicateThroughJoin` will try to push parent filter down though the join, however, when the parent filter is wholly pushed down through the join, the join will become the top node, and then the `transform` method will skip the join to apply the rule.
Suppose we have two tables: table1 and table2:
```
table1: (a: string, b: string, c: string)
table2: (d: string)
```
sql as:
`select * from table1 left join (select d, 'w1' as r from table2) on a = d and r = 'w2' where b = 2`
let's focus on the following optimizer rules:
```
PushPredicateThroughJoin
FodablePropagation
BooleanSimplification
PruneFilters
```
In the above case, on the first iteration of these rules:
PushPredicateThroughJoin ->
`
select * from table1 where b=2 left join (select d, 'w1' as r from table2) on a = d and r = 'w2'`
FodablePropagation ->
`select * from table1 where b=2 left join (select d, 'w1' as r from table2) on a = d and 'w1' = 'w2'`
BooleanSimplification ->
`select * from table1 where b=2 left join (select d, 'w1' as r from table2) on false`
PruneFilters -> No effective
After several iteration of these rules, the join condition will still never be pushed to the
right hand of the left join. thus, in some case(e.g. Large right table), the `BroadcastNestedLoopJoin` may be slow or oom.
This PR will fix this problem!
## How was this patch tested?
exist UT
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org