You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/05/02 23:02:43 UTC

[GitHub] [arrow-datafusion] jon-chuang commented on issue #2351: Optimize EXISTS subquery expressions by rewriting as semi-join

jon-chuang commented on issue #2351:
URL: https://github.com/apache/arrow-datafusion/issues/2351#issuecomment-1115448743

   I'm currently halfway through implementing this, using heuristic rewrite rules, but I've decided to try to make it more generic as per DuckDB/Hyper's (and also apprently Materialize's) approach, which is to pushdown the dependent join through the subquery until all dependent/correlated predicates have been exfiltrated from the subquery. 
   
   This was the intuition which occured to me as the most powerful and complete way, and thankfully there are trailblazers.
   
   This is the best resource, DuckDB's pushdown logic for various logical operators:
   https://github.com/duckdb/duckdb/blob/bee8017bdcc5e652aee26ce8cfb260990cf6a369/src/planner/subquery/flatten_dependent_join.cpp#L72


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org