You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Wes McKinney (JIRA)" <ji...@apache.org> on 2019/06/19 17:38:00 UTC

[jira] [Resolved] (ARROW-4675) [Python] Error serializing bool ndarray in py2 and deserializing in py3

     [ https://issues.apache.org/jira/browse/ARROW-4675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Wes McKinney resolved ARROW-4675.
---------------------------------
    Resolution: Fixed

Issue resolved by pull request 4611
[https://github.com/apache/arrow/pull/4611]

> [Python] Error serializing bool ndarray in py2 and deserializing in py3
> -----------------------------------------------------------------------
>
>                 Key: ARROW-4675
>                 URL: https://issues.apache.org/jira/browse/ARROW-4675
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 0.12.0
>         Environment: * pyarrow 0.12.0
> * numpy 1.16.1
> * Python 3.7.0, 2.7.15
> * (macOS 10.13.6)
>            Reporter: Gabe Joseph
>            Assignee: Wes McKinney
>            Priority: Minor
>              Labels: pull-request-available
>             Fix For: 0.14.0
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> {{np.bool}} is the only dtype I've found that causes this issue. Both empty and non-empty arrays cause it.
> The issue only manifests from py2 to py3; staying within the same version succeeds, as does serializing from py3 and deserializing in py2.
> This appears to just be due to Python 2 {{str}} being deserialized in Python 3 as {{bytes}}; it should be {{unicode}} on the py2 end to come back as {{str}} in py3. I suppose something in the serialization implementation is writing the dtype (just for bool arrays?) using a {{str}}, but haven't dug into it yet.
> {code:bash}
> (two)bash-3.2$ python cereal.py
> (two)bash-3.2$ cat cereal.py 
> # Python 2
> import numpy as np
> import pyarrow as pa
> data = np.array([], dtype=np.dtype('bool'))
> buf = pa.serialize(data).to_buffer()
> outstream = pa.output_stream("buffer")
> outstream.write(buf)
> outstream.close()
> # ...switch to python 3 venv...
> (three)bash-3.2$ cat decereal.py 
> # Python 3
> import numpy as np
> import pyarrow as pa
> instream = pa.input_stream("buffer")
> buf = instream.read()
> data = pa.deserialize(buf)
> print(data)
> (three)bash-3.2$ python3 decereal.py 
> Traceback (most recent call last):
>   File "decereal.py", line 10, in <module>
>     data = pa.deserialize(buf)
>   File "pyarrow/serialization.pxi", line 448, in pyarrow.lib.deserialize
>   File "pyarrow/serialization.pxi", line 411, in pyarrow.lib.deserialize_from
>   File "pyarrow/serialization.pxi", line 262, in pyarrow.lib.SerializedPyObject.deserialize
>   File "pyarrow/serialization.pxi", line 175, in pyarrow.lib.SerializationContext._deserialize_callback
> TypeError: can only concatenate str (not "bytes") to str
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)