You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "alamb (via GitHub)" <gi...@apache.org> on 2023/04/22 11:11:04 UTC

[GitHub] [arrow-datafusion] alamb commented on pull request #6036: Unordered PARTITION BY column implementation (to prevent pipeline breaking)

alamb commented on PR #6036:
URL: https://github.com/apache/arrow-datafusion/pull/6036#issuecomment-1518606303

   I will also try and review this change as well over the coming few days 
   
   > Even if there is no theoretical reason to break pipeline.
   
   It seems to me from reading the description on this PR that the tradeoff is:
   1. Save a sort, (e.g. don't sort by `unsorted_col` ) so faster CPU
   2. the BoundedWindowExec operator has to potentially buffer a large number of partitions (e.g. if `unsorted_column` has a large number of distrinct values) and thus requires more memory in some cases
   
   Is this a fair assesment


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org