You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Kirill Lykov (Jira)" <ji...@apache.org> on 2021/04/05 13:17:00 UTC

[jira] [Commented] (ARROW-10899) [C++] Investigate radix sort for integer arrays

    [ https://issues.apache.org/jira/browse/ARROW-10899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17314850#comment-17314850 ] 

Kirill Lykov commented on ARROW-10899:
--------------------------------------

I added a repository to put there some experiments i've done for the earlier added plots: [https://github.com/KirillLykov/int-sort-bmk]

Unfortunately, I couldn't achieve a fast progress on this ticket and since it is not my main activity I decided to freeze it on my side.
By fast progress I mean delivering a stable non-comparison-based sorting algorithm which is faster than std::stable_sort.
Naiveradix sort which is implemented there is much slower on int64_t as one might find by looking into the plots in scripts/imgs.


The last thing that I was trying to do is to modify boost's integer_sort to make it stable (as unstable version is really fast).
To simplify experiments with integer_sort I've extracted it in one separate file called [https://github.com/KirillLykov/int-sort-bmk/blob/master/src/boost_spread_sort.h]
One can find that integer_sort sometimes relies on pdqsort. This can be replaced with stable_sort.
A more interesting part of the code which I think makes integer_sort unstable is [https://github.com/KirillLykov/int-sort-bmk/blob/master/src/boost_spread_sort.h#L210]
I think in-place version can be replaced with non-in-place.

> [C++] Investigate radix sort for integer arrays
> -----------------------------------------------
>
>                 Key: ARROW-10899
>                 URL: https://issues.apache.org/jira/browse/ARROW-10899
>             Project: Apache Arrow
>          Issue Type: Wish
>          Components: C++
>            Reporter: Antoine Pitrou
>            Assignee: Kirill Lykov
>            Priority: Major
>         Attachments: Screen Shot 2021-02-09 at 17.48.13.png, Screen Shot 2021-02-10 at 10.58.23.png
>
>
> For integer arrays with a non-tiny range of values, we currently use a stable sort. It may be faster to use a radix sort instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)