You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/04/15 11:06:38 UTC

[GitHub] [arrow-datafusion] yjshen commented on issue #1599: Memory Limited Joins (Externalized / Spill)

yjshen commented on issue #1599:
URL: https://github.com/apache/arrow-datafusion/issues/1599#issuecomment-1100042115

   I would propose we do this in several steps:
   
   - [ ]  Provide a classic SortMergeJoin implementation that is less memory bound itself (but move the need of memory management to the sort operator, which we already have memory controlled).
   - [ ] Follow-up choice 1: Consolidate HashJoin and SortMergeJoin, providing a unified JoinExec, and do adaptive execution as @alamb suggested above.
   - [ ] Follow-up choice 2: incorporate a cost-based join optimizer to choose the most suitable physical plan: sort-based or hash-based. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org