You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@arrow.apache.org by "ozankabak (via GitHub)" <gi...@apache.org> on 2023/04/22 21:27:10 UTC

[GitHub] [arrow-datafusion] ozankabak commented on pull request #6036: Unordered PARTITION BY column implementation (to prevent pipeline breaking)

ozankabak commented on PR #6036:
URL: https://github.com/apache/arrow-datafusion/pull/6036#issuecomment-1518826846

   Yes. At this time, we enable this mode only when the input is unbounded (where a sort is not even possible).
   
   For other use cases, we made some initial experiments. These experiments suggest that when you have enough memory to do a full sort without external spills, you should do that instead of using this mode. When you don't have enough memory to do a full sort in memory, using this mode vs. sort-with-spills will likely depend on cardinalities. When we have more data about this, we will send a follow-on PR that will focus on when to enable this outside of streaming.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org