You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Wes McKinney (Jira)" <ji...@apache.org> on 2020/05/23 04:13:00 UTC

[jira] [Commented] (ARROW-8901) [C++] Reduce number of take kernels

    [ https://issues.apache.org/jira/browse/ARROW-8901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17114518#comment-17114518 ] 

Wes McKinney commented on ARROW-8901:
-------------------------------------

We probably need at least int8 through int64 (so we can use take to unpack dictionaries). A different code path will probably be used for running "take" in a selection vector context (per ARROW-8903)

> [C++] Reduce number of take kernels
> -----------------------------------
>
>                 Key: ARROW-8901
>                 URL: https://issues.apache.org/jira/browse/ARROW-8901
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>            Reporter: Wes McKinney
>            Priority: Major
>
> After ARROW-8792 we can observe that we are generating 312 take kernels
> {code}
> In [1]: import pyarrow.compute as pc                                                                      
> In [2]: reg = pc.function_registry()                                                                      
> In [3]: reg.get_function('take')                                                                          
> Out[3]: 
> arrow.compute.Function
> kind: vector
> num_kernels: 312
> {code}
> You can see them all here: https://gist.github.com/wesm/c3085bf40fa2ee5e555204f8c65b4ad5
> It's probably going to be sufficient to only support int16, int32, and int64 index types for almost all types and insert implicit casts (once we implement implicit-cast-insertion into the execution code) for other index types. If we determine that there is some performance hot path where we need to specialize for other index types, then we can always do that.
> Additionally, we should be able to collapse the date/time kernels since we're just moving memory.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)