You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/05/31 06:51:55 UTC

[GitHub] [arrow-datafusion] korowa commented on issue #2628: Support optional filter in SortMergeJoin

korowa commented on issue #2628:
URL: https://github.com/apache/arrow-datafusion/issues/2628#issuecomment-1141733468

   I've tried to do some POC with constructing intermediate batch and applying filter to it while `freeze_buffered_join_streamed` -- it seems to be the only place where filtering required, and noticed (please correct me, if I'm mistaken), that due to `freeze_*` functions logic, output ordering can be broken in case of outer joins -- while freezing, joined and non-joined records from outer table append as separate batches, and after that no merges / resorts happen -- just batch concatenation.
   
   I suppose output ordering to be quite important for planning (i.e. if we had merge/stream/pipe aggregate operator it could be planned over merge join output), so I wonder - shouldn't this be fixed prior to MJ filter? I guess this fix could significantly change MJ logic in places where filtering required 🤔 
   
   @yjshen , @richox what do you think of it? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org