You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Alvin Chunga Mamani (Jira)" <ji...@apache.org> on 2022/03/21 19:37:00 UTC
[jira] [Commented] (ARROW-14183) [C++] Improve select_k_unstable performance
[ https://issues.apache.org/jira/browse/ARROW-14183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17510085#comment-17510085 ]
Alvin Chunga Mamani commented on ARROW-14183:
---------------------------------------------
In [#12582|https://github.com/apache/arrow/pull/12582] was implemented a RadixSort per batches(taking advantage of contiguous elements in memory as same of table-sort) + Merge K first Elements, and the results seems to be equal to the current implementation of use a Heap (just a ~5% more fast in some cases). With this implementation the RadixSort takes most of the time because it sort in batches the whole elements, instead of that I can implement just a Merge K first elements and for small values of K against thousands/millions of elements, but maybe it will be same complexity of the Heap, because it will sort N logK until K < length(current_block)
CC: [~lidavidm] [~apitrou]
> [C++] Improve select_k_unstable performance
> -------------------------------------------
>
> Key: ARROW-14183
> URL: https://issues.apache.org/jira/browse/ARROW-14183
> Project: Apache Arrow
> Issue Type: Improvement
> Components: C++
> Reporter: David Li
> Assignee: Alvin Chunga Mamani
> Priority: Major
> Labels: kernel, pull-request-available
> Attachments: image-2022-02-16-02-35-42-818.png
>
> Time Spent: 7h 10m
> Remaining Estimate: 0h
>
> Recently {{sort_indices}} was optimized and refactored in ARROW-14165. One of those optimizations could also be applied to {{select_k_unstable}}'s implementation.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)