You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "nseekhao (via GitHub)" <gi...@apache.org> on 2023/09/21 16:23:29 UTC

[GitHub] [arrow-datafusion] nseekhao commented on issue #7611: Substrait: Combine join on and filter expressions in a single Substrait JoinRel's field

nseekhao commented on issue #7611:
URL: https://github.com/apache/arrow-datafusion/issues/7611#issuecomment-1729906548

   There is a few things in my description that I need to correct here.
   1. The issue should not have been labeled `enhancement`, but `bug`
   2. If the join `on` field is empty, the producer does not put `Literal(True)` in the field, it will just produce `None`
   3. The reason this issue should be labeled as `bug` is because the current use of `post_join_filter` is incorrect.
   
   According to the [ANSI join syntax (explained by IBM)](https://www.ibm.com/docs/en/informix-servers/14.10?topic=selectivity-use-join-filters-post-join-filters), there are two types of join filters: **filter** and **post join filter.**
   
   - Any predicates specified in the `ON` clause is considered a `filter`
   - Any predicates specified in the `WHERE` clause is considered a `post_join_filter`
   
   In the case of an `INNER` join, the join `filter` and `post_join_filter` achieves the same results. However, in the case of `LEFT` join, the results will be different depending on whether a filter is pre/during or post join. Please refer to this [reference](https://facebookincubator.github.io/velox/develop/joins.html) for an example.
   
   Since the current version of `datafusion` does not have a post-join filter field in a join relation, the `datafusion-substrait` producer should not produce a plan with a `post_join_filter`.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org