You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "lidavidm (Jira)" <ji...@apache.org> on 2019/09/05 13:06:00 UTC
[jira] [Commented] (ARROW-2428) [Python] Add API to map Arrow types (including extension types) to pandas ExtensionArray instances for to_pandas conversions
[ https://issues.apache.org/jira/browse/ARROW-2428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16923418#comment-16923418 ]
lidavidm commented on ARROW-2428:
---------------------------------
Hi Joris, overall I agree with the approach here. It's a little unfortunate that Pandas doesn't have a general column/table metadata mechanism...
I agree that we want both a default hook for ExtensionType->Pandas conversions, and a way to override conversions on an individual basis. I think adding a new argument to {{to_pandas}} is easier than maintaining yet another function registry. Similarly, adding a conversion method on {{ExtensionType}} (or maybe that should be a future {{ExtensionArray}} class?) would be preferable to maintaining a registry.
If we have something like {{pa.ExtensionType.\_\_pandas_array\_\_}}, should we also have {{pa.ExtensionType.\_\_pandas_dtype\_\_}}?
> [Python] Add API to map Arrow types (including extension types) to pandas ExtensionArray instances for to_pandas conversions
> ----------------------------------------------------------------------------------------------------------------------------
>
> Key: ARROW-2428
> URL: https://issues.apache.org/jira/browse/ARROW-2428
> Project: Apache Arrow
> Issue Type: Improvement
> Components: Python
> Reporter: Uwe L. Korn
> Priority: Major
> Fix For: 1.0.0
>
>
> With the next release of Pandas, it will be possible to define custom column types that back a {{pandas.Series}}. Thus we will not be able to cover all possible column types in the {{to_pandas}} conversion by default as we won't be aware of all extension arrays.
> To enable users to create {{ExtensionArray}} instances from Arrow columns in the {{to_pandas}} conversion, we should provide a hook in the {{to_pandas}} call where they can overload the default conversion routines with the ones that produce their {{ExtensionArray}} instances.
> This should avoid additional copies in the case where we would nowadays first convert the Arrow column into a default Pandas column (probably of object type) and the user would afterwards convert it to a more efficient {{ExtensionArray}}. This hook here will be especially useful when you build {{ExtensionArrays}} where the storage is backed by Arrow.
> The meta-issue that tracks the implementation inside of Pandas is: https://github.com/pandas-dev/pandas/issues/19696
--
This message was sent by Atlassian Jira
(v8.3.2#803003)