You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Kirill Lykov (Jira)" <ji...@apache.org> on 2021/05/09 14:57:00 UTC

[jira] [Commented] (ARROW-10899) [C++] Investigate radix sort for integer arrays

    [ https://issues.apache.org/jira/browse/ARROW-10899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17341526#comment-17341526 ] 

Kirill Lykov commented on ARROW-10899:
--------------------------------------

Do we have some information about distribution of integers? And also are they signed/unsigned?



I've did some more benchmarking and plots [int-sort-bmks repo|[https://github.com/KirillLykov/int-sort-bmk].]
In short, radix sort is almost always performs better than std::stable_sort.
But there are some cases when stable_sort is better – sorted sequence, all elements are equal.

Another observation is that it  looks like LSD implementation is faster than MSD implementation. 
Yet it is possible to have almost the same performance using MSD/LSD hybrid algorithm if we think that MSD is better for memory bandwidth.

The question is how to measure memory bandwidth consumption, I employed LLC-cache-misses events. 

> [C++] Investigate radix sort for integer arrays
> -----------------------------------------------
>
>                 Key: ARROW-10899
>                 URL: https://issues.apache.org/jira/browse/ARROW-10899
>             Project: Apache Arrow
>          Issue Type: Wish
>          Components: C++
>            Reporter: Antoine Pitrou
>            Assignee: Kirill Lykov
>            Priority: Major
>         Attachments: Screen Shot 2021-02-09 at 17.48.13.png, Screen Shot 2021-02-10 at 10.58.23.png, all_random_wholeRange.png, all_random_wholeRange.png, all_random_wholeRange.png
>
>
> For integer arrays with a non-tiny range of values, we currently use a stable sort. It may be faster to use a radix sort instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)