You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Yibo Cai (Jira)" <ji...@apache.org> on 2020/09/01 05:03:00 UTC

[jira] [Comment Edited] (ARROW-9873) [C++][Compute] Improve mode kernel for intergers within limited value range

    [ https://issues.apache.org/jira/browse/ARROW-9873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17186213#comment-17186213 ] 

Yibo Cai edited comment on ARROW-9873 at 9/1/20, 5:02 AM:
----------------------------------------------------------

Maybe we can use counting method as first step, then scan the counter array and insert into a map finally. Guess there won't cause much performance loss as the map is small, and we can reserve buckets first. Will do some tests.

Test result with existing benchmark (values within -100~100, array size 1M in bytes):
- Small performance drop (< 10%) for Boolean and Int8.
- About 2x performance improvement for Int16/32/64 with limited value range.

Adjusting value range and array size leads to consistent performance uplift.


was (Author: yibo):
Maybe we can use counting method as first step, then scan the counter array and insert into a map finally. Guess there won't cause much performance loss as the map is small, and we can reserve buckets first. Will do some tests.

> [C++][Compute] Improve mode kernel for intergers within limited value range
> ---------------------------------------------------------------------------
>
>                 Key: ARROW-9873
>                 URL: https://issues.apache.org/jira/browse/ARROW-9873
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>            Reporter: Yibo Cai
>            Assignee: Yibo Cai
>            Priority: Major
>         Attachments: mode-range-skylake.png
>
>
> It's possible to improve mode kernel performance for integers within limited value range by using a value indexed array instead of general hash table.
>  Similar trick is used in sorting kernel ARROW-1571.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)