You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Alvin Chunga Mamani (Jira)" <ji...@apache.org> on 2022/03/21 19:37:00 UTC

[jira] [Commented] (ARROW-14183) [C++] Improve select_k_unstable performance

    [ https://issues.apache.org/jira/browse/ARROW-14183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17510085#comment-17510085 ] 

Alvin Chunga Mamani commented on ARROW-14183:
---------------------------------------------

In [#12582|https://github.com/apache/arrow/pull/12582] was implemented a RadixSort per batches(taking advantage of contiguous elements in memory as same of table-sort) + Merge K first Elements, and the results seems to be equal to the current implementation of use a Heap (just a ~5% more fast in some cases). With this implementation the RadixSort takes most of the time because it sort in batches the whole elements, instead of that I can implement just a Merge K first elements and for small values of K against thousands/millions of elements, but maybe it will be same complexity of the Heap, because it will sort N logK until K < length(current_block)
CC: [~lidavidm] [~apitrou] 

> [C++] Improve select_k_unstable performance
> -------------------------------------------
>
>                 Key: ARROW-14183
>                 URL: https://issues.apache.org/jira/browse/ARROW-14183
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>            Reporter: David Li
>            Assignee: Alvin Chunga Mamani
>            Priority: Major
>              Labels: kernel, pull-request-available
>         Attachments: image-2022-02-16-02-35-42-818.png
>
>          Time Spent: 7h 10m
>  Remaining Estimate: 0h
>
> Recently {{sort_indices}} was optimized and refactored in ARROW-14165. One of those optimizations could also be applied to {{select_k_unstable}}'s implementation.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)