You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/05/24 16:27:18 UTC

[GitHub] [arrow-datafusion] alamb opened a new issue #416: Optimize sort preserving merge

alamb opened a new issue #416:
URL: https://github.com/apache/arrow-datafusion/issues/416


   **Is your feature request related to a problem or challenge? Please describe what you are trying to do.**
   The new sort preserving merge operator, introduced in #379 likely has room for performance improvement.
   
   
   **Describe the solution you'd like**
   1. Create a benchmark for the merging operator 
   2. Optimize / improve benchmark as appropriate
   
   Here is a suggestion from @jhorstmann  https://github.com/apache/arrow-datafusion/pull/379/files#r637948151 as a separate ticket so it doesn't get lost:
   
   For bigger number of partitions, storing the cursors in a BinaryHeap, sorted by their current item, would be beneficial.
   
   A rust implementation of that approach can be seen in [this blog post and the first comment under it][1]. I have implemented the same approach in java before. I agree with @alamb though to make it work first, and then optimize later.
   
   [1]: https://dev.to/creativcoder/merge-k-sorted-arrays-in-rust-1b2f
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] jorgecarleitao commented on issue #416: Optimize sort preserving merge

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on issue #416:
URL: https://github.com/apache/arrow-datafusion/issues/416#issuecomment-847174015


   Also for inspiration: https://github.com/jorgecarleitao/arrow2/blob/main/src/compute/merge_sort/mod.rs


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org