You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Joris Van den Bossche (JIRA)" <ji...@apache.org> on 2019/07/11 13:15:00 UTC

[jira] [Commented] (ARROW-5907) base64 support of bytes-like

    [ https://issues.apache.org/jira/browse/ARROW-5907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16882942#comment-16882942 ] 

Joris Van den Bossche commented on ARROW-5907:
----------------------------------------------

(updated the issue with the example from SO)

For this, I think we would need to implement the buffer protocol, as was done for Tensors (ARROW-2276 / https://github.com/apache/arrow/pull/1741). However, Arrays are not necessarily exactly fitting the buffer protocol (additional buffer for the nulls, can be nested, ..) unlike the Tensors. So not sure that should be done? Or only for primitive arrays without null, how zero-copy conversion to numpy is limited to those cases.

That last part is actually a work-around, you can convert the pyarrow Array to a numpy array without copying data with {{a.to_numpy()}} (it's a view, if it needs to copy it will raise an error), and then pass that to the function that needs a bytes-like object.

> base64 support of bytes-like
> ----------------------------
>
>                 Key: ARROW-5907
>                 URL: https://issues.apache.org/jira/browse/ARROW-5907
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: Python
>    Affects Versions: 0.14.0
>            Reporter: Litchy
>            Priority: Major
>             Fix For: 0.14.0
>
>
> Currently pyarrow could not be encoded by base64:
> {code}
> t = numpy.arange(25, dtype=np.float64)
> a = pyarrow.array(t)
> s1 = base64.b64encode(t) # this works
> s2 = base64.b64encode(a)
> {code}
> gives "a bytes-like object is required not 'pyarrow.lib.DoubleArray'"
> Because it is not bytes-like
> A possible scenario could be if we want to push data(like ndarray) to Redis in Python and get it from other language like Java. Arrow could be used to interact between Python and Java using Array of Arrow.
> Adding this feature would support some in-queue and out-queue operations like Redis



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)