You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/06/09 06:03:24 UTC

[GitHub] [arrow-datafusion] Dandandan commented on issue #488: Map `IN` to semi join

Dandandan commented on issue #488:
URL: https://github.com/apache/arrow-datafusion/issues/488#issuecomment-857405281


   Hey @msathis that would be great.
   
   Effectively it means rewriting queries from:
   
   ```
   SELECT a, b
   FROM 
   x
   WHERE a in (select b from t)
   ``` 
   
   Could be written as (minus SQL syntax)
   
   ```
   SELECT a, b
   FROM
   x
   SEMI JOIN t ON a=b
   ```
   
   So the work will be
   
   * adding `IN` as option to the `Expr` enum and adding it to the `sql/planner`.
   * extracting applicable `IN` expression and transforming it to (left and right) columns
   * converting it to a semi join (a join with `JoinType::Semi`) either directly in the planner, and/or add a optimization rule (e.g. translating a cross join to a semi join). the first would be fine for now.
   
   I think we can return an error in case the logical plan still contains a `IN` in a expression somewhere.
   
   One complication I saw is that adding a `LogicalPlan` to the `Expr` (for encoding `IN`) is not trivial, because `Expr` has some derived `Eq` etc. which the logical plan does not have.
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org