You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/11/09 16:54:56 UTC

[GitHub] [arrow-datafusion] alamb commented on issue #4140: Support more expressions in equality join

alamb commented on issue #4140:
URL: https://github.com/apache/arrow-datafusion/issues/4140#issuecomment-1309053379

   For what it is worth, the existing join logic contains hard coded assumptions that the join is between two columns in several places. Changing the join logic (which is already complicated and will likely only get more so) is likely to be quite challenging
   
   So therefore  I agree with @mingmwang's proposal of:
   
   > My current preference is to keep the equal join condition as columns, but add another projection to project the expression to normal columns and push down the projection.
   
   
   > if the expressions are trival cast, support them as equal join conditions
   
   I don't think there would be any performance difference between casting in a Projection or in the Join itself and I think it would keep the Join significantly less complicated. 
   
   I would suggest not handling casts in the Join but instead work on improving the other  optimizer simplification rules to remove the casts. Like the expression
   
   ```
   |               |   Filter: CAST(t0.c0 AS Int64) + Int64(1) = CAST(t1.c0 AS Int64)                                          |
   ```
   
   Can be rewritten into
   ```
   |               |   Filter: t0.c0 + Int32(1) = t1.c0                                           |
   ```
   
   Which we would still have to have a `projection` to evaluate `t0.c0 + 1` but `t1.c0` would need no processing
   
   
   This type of unwrapping is already done in https://github.com/apache/arrow-datafusion/blob/master/datafusion/optimizer/src/unwrap_cast_in_comparison.rs
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org