You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/04/08 04:57:17 UTC

[GitHub] [arrow] Dandandan commented on pull request #9937: ARROW-12279: [Rust][DataFusion] Add test for null handling in hash join (ARROW-12266)

Dandandan commented on pull request #9937:
URL: https://github.com/apache/arrow/pull/9937#issuecomment-815445960


   > > We should filter on nulls beforehand to make this result correct. Probably the best way to go here I think is to add a filter in the logical plan on non-null for inner / left and right joins.
   > 
   > I am not sure this works for all join types (OUTER JOIN as well as , ANTI-JOIN and SEMI-JOIN which are optimizations for subqueries)
   > 
   > It might make sense to check for null when building the hash table for inner join keys (as NULL will never equal NULL)
   
   You are right, not for outer join or other joins (but we don't have them yet). For those, I think the rows have to be included, but might need some changes too wrt equality and building the hashmap. The filter approach is what Spark does fwiw. I think that makes also sense in the conceptually as joins should also support other conditions and allows for greater efficiency.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org