You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/01/21 07:54:48 UTC

[GitHub] [arrow-datafusion] liukun4515 removed a comment on pull request #1618: Fix predicate pushdown for outer joins

liukun4515 removed a comment on pull request #1618:
URL: https://github.com/apache/arrow-datafusion/pull/1618#issuecomment-1018260246


   > # Which issue does this PR close?
   > Closes #1586
   > 
   > # What changes are included in this PR?
   > This change does a couple of things:
   > 
   > 1. Does not attempt to push down any predicates for the nullable (non-preserved) side of a given join
   > 2. Bails on the optimization to duplicate post-join filters on a column involved in the join equality to the other side of the join, if the other side of the join is nullable.
   > 
   > I'll elaborate on point 2 here as I think the above is a bit difficult to parse. Consider the following query:
   > 
   > ```sql
   > SELECT *
   > FROM t1 JOIN t2 
   >   on t1.id = t2.uid 
   > WHERE t1.id > 1
   > ```
   > 
   > Right now this is being rewritten as:
   > 
   > ```sql
   > SELECT *
   > FROM t1 JOIN t2 
   >   on t1.id = t2.uid 
   > WHERE t1.id > 1 AND t2.uid > 1 -- Duplicate filter to t2.uid, so it can be pushed down
   > ```
   > 
   > This is correct in the case of certain joins. However, I believe that in cases where the other join side is non-preserved, we should not apply this optimization. For example, if we change the above join to a `LEFT` join, the query rewrite becomes:
   > 
   > ```sql
   > SELECT *
   > FROM t1 LEFT JOIN t2 
   >   on t1.id = t2.uid 
   > WHERE t1.id > 1 AND t2.uid > 1
   > ```
   > 
   > The additional filter on `t2.uid` is wrong - as it omits null rows that would have been present in the original query.
   
   great fix


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org