You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/04/27 12:32:55 UTC

[GitHub] [arrow-datafusion] andygrove opened a new issue, #2351: Optimize EXISTS subquery expressions by rewriting as semi-join

andygrove opened a new issue, #2351:
URL: https://github.com/apache/arrow-datafusion/issues/2351

   **Is your feature request related to a problem or challenge? Please describe what you are trying to do.**
   I would like an optimizer rule that rewrites queries with EXISTS subqueries as semi-joins See discussion in https://github.com/apache/arrow-datafusion/pull/2344.
   
   **Describe the solution you'd like**
   See discussion in https://github.com/apache/arrow-datafusion/pull/2344 for an example.
   
   **Describe alternatives you've considered**
   None
   
   **Additional context**
   None
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] jon-chuang commented on issue #2351: Optimize EXISTS subquery expressions by rewriting as semi-join

Posted by GitBox <gi...@apache.org>.
jon-chuang commented on issue #2351:
URL: https://github.com/apache/arrow-datafusion/issues/2351#issuecomment-1115448743

   I'm currently halfway through implementing this, using heuristic rewrite rules, but I've decided to try to make it more generic as per DuckDB/Hyper's (and also apprently Materialize's) approach, which is to pushdown the dependent join through the subquery until all dependent/correlated predicates have been exfiltrated from the subquery. 
   
   This was the intuition which occured to me as the most powerful and complete way, and thankfully there are trailblazers.
   
   This is the best resource, DuckDB's pushdown logic for various logical operators:
   https://github.com/duckdb/duckdb/blob/bee8017bdcc5e652aee26ce8cfb260990cf6a369/src/planner/subquery/flatten_dependent_join.cpp#L72


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] andygrove closed issue #2351: Optimize EXISTS subquery expressions by rewriting as semi-join

Posted by GitBox <gi...@apache.org>.
andygrove closed issue #2351: Optimize EXISTS subquery expressions by rewriting as semi-join
URL: https://github.com/apache/arrow-datafusion/issues/2351


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] andygrove commented on issue #2351: Optimize EXISTS subquery expressions by rewriting as semi-join

Posted by GitBox <gi...@apache.org>.
andygrove commented on issue #2351:
URL: https://github.com/apache/arrow-datafusion/issues/2351#issuecomment-1368249074

   I think this can be closed now


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org