You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Chang She (Jira)" <ji...@apache.org> on 2022/09/25 23:01:00 UTC

[jira] [Commented] (ARROW-17535) [Python] List arrays aren't supported in to_pandas calls

    [ https://issues.apache.org/jira/browse/ARROW-17535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17609261#comment-17609261 ] 

Chang She commented on ARROW-17535:
-----------------------------------

What I was thinking is the following possibilities:

1. If the ExtensionType is associated with an ExtensionArray subtype that overrides “to_pandas”, we could do the to_pandas call on the list values array and then use the offsets to create the proper pandas array

2. If the ExtensionType is associated with an ExtensionScalar, then you can call to_polish on the values array and then use the offsets to construct the pandas array

For computer vision data this is actually fairly important as very often we have a list-of-labels or list-of-Box2d per row (image)

> [Python] List<Extension> arrays aren't supported in to_pandas calls
> -------------------------------------------------------------------
>
>                 Key: ARROW-17535
>                 URL: https://issues.apache.org/jira/browse/ARROW-17535
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: C++, Python
>            Reporter: Micah Kornfield
>            Priority: Minor
>
> EXTENSION is not in the list of types allowed.  I think in order to enable EXTENSION we need to be able to call to_pylist or similar on the original extension array from C++ code, in case there were user provided overrides.  Off the top of my head one way of doing this would be to pass through an additional std::unorderd_map<Array*, PyObject*> where PyObject is the bound to_pylist python function.  Are there other alternative that might be cleaner?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)