You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "tustvold (via GitHub)" <gi...@apache.org> on 2023/04/05 16:52:44 UTC

[GitHub] [arrow-datafusion] tustvold opened a new issue, #5885: Sort Memory Accounting

tustvold opened a new issue, #5885:
URL: https://github.com/apache/arrow-datafusion/issues/5885

   ### Is your feature request related to a problem or challenge?
   
   SortPreservingMerge currently has extremely limited memory accounting functionality, with no accounting for buffered batches or cursors.
   
   The only memory accounting is a static assignment at construction time by `ExternalSorter` of the size of the in memory batches, when merging spilled and in-memory data. This assignment is never decremented, and does not take into account any memory usage resulting from loading the spilled data back into memory.
   
   ### Describe the solution you'd like
   
   SortPreservingMerge should account for the memory usage of the data it has buffered
   
   Additionally the various streams created by `ExternalSorter`, both for in-memory and spilled data, should be accounted for
   
   ### Describe alternatives you've considered
   
   _No response_
   
   ### Additional context
   
   #5879 tracks unifying the sorting implementations, which may help make this story more consistent.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on issue #5885: SortPreservingMerge does not account for memory usage

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on issue #5885:
URL: https://github.com/apache/arrow-datafusion/issues/5885#issuecomment-1640661113

   We saw this cause troubles for us during some internal testing. More details to come


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on issue #5885: SortPreservingMerge does not account for memory usage

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on issue #5885:
URL: https://github.com/apache/arrow-datafusion/issues/5885#issuecomment-1655980415

   Specifically, we have pretty good evidence that for dictionary encoded arrays with high cardinality, the interned dictionary values consume an ever increasing amount of memory


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on issue #5885: SortPreservingMerge does not account for memory usage

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on issue #5885:
URL: https://github.com/apache/arrow-datafusion/issues/5885#issuecomment-1576519689

   There is a PR open for this feature -- I think it is valuable to complete. However, it currently doesn't pass the PR tests (I think the tests need some updating)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] yjshen closed issue #5885: SortPreservingMerge does not account for memory usage

Posted by "yjshen (via GitHub)" <gi...@apache.org>.
yjshen closed issue #5885: SortPreservingMerge does not account for memory usage
URL: https://github.com/apache/arrow-datafusion/issues/5885


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org