You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/01/20 20:18:33 UTC

[GitHub] [arrow-datafusion] james727 commented on issue #1585: convert `outer join` to `inner join` to improve performance

james727 commented on issue #1585:
URL: https://github.com/apache/arrow-datafusion/issues/1585#issuecomment-1017892324


   @xudong963 I'd like to pick this up once https://github.com/apache/arrow-datafusion/pull/1618 is (hopefully) eventually merged. Question re: implementation - do you think this should be a separate optimizer pass or something else?
   
   Apologies in advance if this is all obvious - the reason I ask is that it seems that this rule benefits from filter pushdown occurring both before and after the join rewrite happens. For example, consider the following contrived query:
   ```sql
   SELECT * FROM (
     SELECT * FROM t1 LEFT JOIN t2 ON t1.id = t2.uid
   )
   WHERE t2.uid IS NOT NULL
   ```
   
   If we do not push `t2.uid IS NOT NULL` into the subquery, we will not know that the join can be rewritten when we optimize the join node. Thus, we should push filters down as far as possible before rewriting the join, giving the following:
   ```sql
   -- Push filter into subquery
   SELECT * FROM (
     SELECT * FROM t1 LEFT JOIN t2 ON t1.id = t2.uid
     WHERE t2.uid IS NOT NULL
   )
   
   -- Rewrite join
   SELECT * FROM (
     SELECT * FROM t1 INNER JOIN t2 ON t1.id = t2.uid
     WHERE t2.uid IS NOT NULL
   )
   ```
   
   However, at this point we would benefit from another pass with the filter pushdown rule, as `t2.uid IS NOT NULL` can now be pushed down to the underlying scan on `t2`. 
   
   It looks like optimizer passes are sequenced in a vector defined in `context.rs` - so my question could be reframed as: how would you structure this change to facilitate first pushing filters down as far as possible, then rewriting joins where possible, then pushing down any filters that are now enabled due to outer joins becoming inner joins?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org