You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Kirill Lykov (Jira)" <ji...@apache.org> on 2021/02/09 16:24:00 UTC
[jira] [Comment Edited] (ARROW-10899) [C++] Investigate radix sort for integer arrays

    [ https://issues.apache.org/jira/browse/ARROW-10899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17281864#comment-17281864 ] 

Kirill Lykov edited comment on ARROW-10899 at 2/9/21, 4:23 PM:
---------------------------------------------------------------

1. From the paper "Fast Sort on CPUs and GPUs: A Case for Bandwidth Oblivious SIMD Sort" by Satish et al:
 They compare, beside other things, performance of radix sort and merge sort using intrinsics.
 Both implementations are highly optimized for simd architecture and radix has many not obvious improvements to make it more cache-friendly.
 Radix sort performance depends on the key size because the longer key we have the more passes we need to do. Radix is bandwidth-bound starting at 48-bits keys.
 Further, Satish et al report that merge sort is 1.7X slower than radix sort on 32- bit keys, but becomes comparable in performance to radix on 64-bit keys and 1.3X better on 128-bit keys.
 Their conclusion is that "bandwidth-oblivious SIMD friendly merge sort, should, therefore, be the sorting method of choice for future databases".

2. I haven't found a good opensource radix sort for CPUs. There is one in boost called spread sort but looks promising. 


was (Author: klykov):
1. From the paper "Fast Sort on CPUs and GPUs: A Case for Bandwidth Oblivious SIMD Sort" by Satish et al:
They compare, beside other things, performance of radix sort and merge sort using intrinsics.
Both implementations are highly optimized for simd architecture and radix has many not obvious improvements to make it more cache-friendly.
Radix sort performance depends on the key size because the longer key we have the more passes we need to do. Radix is bandwidth-bound starting at 48-bits keys.
Further, Satish et al report that merge sort is 1.7X slower than radix sort on 32- bit keys, but becomes comparable in performance to radix on 64-bit keys and 1.3X better on 128-bit keys.
Their conclusion is that "bandwidth-oblivious SIMD friendly merge sort, should, therefore, be the sorting method of choice for future databases".

2. I haven't found a good opensource radix sort for CPUs. There is one in boost called spread sort but it is not actually a radix sort and according to my benchmarks for integers it demonstrates exactly the same performance as gcc's stable_sort. From description, It might be designed for strings.

> [C++] Investigate radix sort for integer arrays
> -----------------------------------------------
>
>                 Key: ARROW-10899
>                 URL: https://issues.apache.org/jira/browse/ARROW-10899
>             Project: Apache Arrow
>          Issue Type: Wish
>          Components: C++
>            Reporter: Antoine Pitrou
>            Assignee: Kirill Lykov
>            Priority: Major
>
> For integer arrays with a non-tiny range of values, we currently use a stable sort. It may be faster to use a radix sort instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)