You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Clark Zinzow (Jira)" <ji...@apache.org> on 2021/09/02 01:09:00 UTC

[jira] [Comment Edited] (ARROW-5890) [C++][Python] Support ExtensionType arrays in more kernels

    [ https://issues.apache.org/jira/browse/ARROW-5890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17408456#comment-17408456 ] 

Clark Zinzow edited comment on ARROW-5890 at 9/2/21, 1:08 AM:
--------------------------------------------------------------

[~apitrou] I'm working on a tensor column extension type similar to [this one|https://github.com/CODAIT/text-extensions-for-pandas/blob/dc03278689fe1c5f131573658ae19815ba25f33e/text_extensions_for_pandas/array/arrow_conversion.py] and was hoping to allow users to interpret Parquet columns containing bytes blobs (e.g. images) as tensors by having them provide a schema for those columns, where the column's dtype is a tensor array extension type instantiated with the requisite data (shape, dtype, etc.) to cast that column as a tensor array. Since there isn't a static conversion between the bytes blobs and the underlying extension array dtype (both the shape and the underlying element dtype is parameterizable), it'd be nice if an extension type could register a cast function so we could use the shape and dtype context to properly interpret those bytes blobs.


was (Author: clarkzinzow):
[~apitrou] I'm working on a tensor column extension type similar to [this one|https://github.com/CODAIT/text-extensions-for-pandas/blob/dc03278689fe1c5f131573658ae19815ba25f33e/text_extensions_for_pandas/array/arrow_conversion.py] and was hoping to allow users to interpret Parquet columns containing bytes blobs (e.g. images) as tensors by having them provide a schema for those columns containing a tensor array extension type instantiated with the requisite data (shape, dtype, etc.) to cast that column as a tensor array. Since there isn't a static conversion between the bytes blobs and the underlying extension array dtype (both the shape and the underlying element dtype is parameterizable), it'd be nice if an extension type could register a cast function so we could use the shape and dtype context to properly interpret those bytes blobs.

> [C++][Python] Support ExtensionType arrays in more kernels
> ----------------------------------------------------------
>
>                 Key: ARROW-5890
>                 URL: https://issues.apache.org/jira/browse/ARROW-5890
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>            Reporter: Joris Van den Bossche
>            Priority: Major
>
> From a quick test (through Python), it seems that {{slice}} and {{take}} work, but the following not:
> - {{cast}}: it could rely on the casting rules for the storage type. Or do we want that you explicitly have to take the storage array before casting?
> - {{dictionary_encode}} / {{unique}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)