You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "vincev (via GitHub)" <gi...@apache.org> on 2023/07/15 12:47:22 UTC

[GitHub] [arrow-rs] vincev opened a new issue, #4529: Add a merge sort kernel

vincev opened a new issue, #4529:
URL: https://github.com/apache/arrow-rs/issues/4529

   **Is your feature request related to a problem or challenge? Please describe what you are trying to do.**
   
   I have an aggregation function that sorts a number of small arrays in parallel and then in the final aggregation step produces a big sorted array from all the smaller sorted arrays. 
   
   Currently I am doing a `concat`, that creates a big array from the smaller sorted arrays, followed by a `sort` that creates another array,  it would be probably more efficient if there was a way to merge the already sorted arrays to a final array in one step.
   
   **Describe the solution you'd like**
   
   Would be nice to have a function like:
   
   ```rust
   pub fn merge_sort(arrays: &[&dyn Array], options: Option<SortOption>) -> Result<ArrayRef, ArrowError>;
   ```
   
   that takes a number of sorted arrays and produces a sorted array by merging them.
   
   <!--
   A clear and concise description of what you want to happen.
   -->
   
   **Describe alternatives you've considered**
   
   I am doing `concat` followed by `sort`.
   <!--
   A clear and concise description of any alternative solutions or features you've considered.
   -->
   
   **Additional context**
   <!--
   Add any other context or screenshots about the feature request here.
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on issue #4529: Add a merge sort kernel

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on issue #4529:
URL: https://github.com/apache/arrow-rs/issues/4529#issuecomment-1646955481

   This is consistent with my findings in https://github.com/apache/arrow-datafusion/pull/6163, thank you foe investigating


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] vincev commented on issue #4529: Add a merge sort kernel

Posted by "vincev (via GitHub)" <gi...@apache.org>.
vincev commented on issue #4529:
URL: https://github.com/apache/arrow-rs/issues/4529#issuecomment-1646945406

   I did some testing comparing `concat` then `sort` against parallel `sort` then `merge` using the looser tree approach used in DataFusion:
   
   ```
   Running 8 threads.
   concat_then_sort: Sorted 100663296 values in 2048 arrays 2.179s
   sort_then_merge:  Sorted 100663296 values in 2048 arrays 5.351s
   ```
   
   `concat` then `sort` is more than twice as fast than `sort` then `merge` with `UInt64Arrays`.
   
   I am going to close this issue but let me know if you would like more information.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on issue #4529: Add a merge sort kernel

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on issue #4529:
URL: https://github.com/apache/arrow-rs/issues/4529#issuecomment-1636784246

   Perhaps we might upstream some of the SorrPreservingMerge logic from DataFusion 🤔


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] vincev closed issue #4529: Add a merge sort kernel

Posted by "vincev (via GitHub)" <gi...@apache.org>.
vincev closed issue #4529: Add a merge sort kernel 
URL: https://github.com/apache/arrow-rs/issues/4529


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] vincev commented on issue #4529: Add a merge sort kernel

Posted by "vincev (via GitHub)" <gi...@apache.org>.
vincev commented on issue #4529:
URL: https://github.com/apache/arrow-rs/issues/4529#issuecomment-1637643870

   > Perhaps we might upstream some of the SortPreservingMerge / SortExec logic from DataFusion 🤔
   
   That looks interesting, I didn't know about tournament trees, if no one else has time to work on this I may give it a try later this week.
   
   I was thinking it would also be good to have a `limit` parameter to do partial sort.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org