You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Uwe L. Korn (JIRA)" <ji...@apache.org> on 2017/04/15 14:08:41 UTC

[jira] [Commented] (ARROW-823) [Python] Devise a means to serialize arrays of arbitrary Python objects in Arrow IPC messages

    [ https://issues.apache.org/jira/browse/ARROW-823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15969968#comment-15969968 ] 

Uwe L. Korn commented on ARROW-823:
-----------------------------------

For me this issue depends also on how we represent custom/shared Pandas/Python metadata in Arrow and Parquet. While this will not fully solves the issue, I think that discussion is one of the requirements.

> [Python] Devise a means to serialize arrays of arbitrary Python objects in Arrow IPC messages
> ---------------------------------------------------------------------------------------------
>
>                 Key: ARROW-823
>                 URL: https://issues.apache.org/jira/browse/ARROW-823
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: Python
>            Reporter: Wes McKinney
>
> Practically speaking, this would involve a "custom" logical type that is "pyobject", represented physically as an array of 64-bit pointers. On serialization, this would need to be converted to a BinaryArray containing pickled objects as binary values
> At the moment, we don't yet have the machinery to deal with "custom" types where the in-memory representation is different from the on-wire representation. This would be a useful use case to work through the design issues
> Interestingly, if done properly, this would enable other Arrow implementations to manipulate (filter, etc.) serialized Python objects as binary blobs. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)