You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Wes McKinney (JIRA)" <ji...@apache.org> on 2019/06/19 17:38:00 UTC
[jira] [Resolved] (ARROW-4675) [Python] Error serializing bool
ndarray in py2 and deserializing in py3
[ https://issues.apache.org/jira/browse/ARROW-4675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wes McKinney resolved ARROW-4675.
---------------------------------
Resolution: Fixed
Issue resolved by pull request 4611
[https://github.com/apache/arrow/pull/4611]
> [Python] Error serializing bool ndarray in py2 and deserializing in py3
> -----------------------------------------------------------------------
>
> Key: ARROW-4675
> URL: https://issues.apache.org/jira/browse/ARROW-4675
> Project: Apache Arrow
> Issue Type: Bug
> Components: Python
> Affects Versions: 0.12.0
> Environment: * pyarrow 0.12.0
> * numpy 1.16.1
> * Python 3.7.0, 2.7.15
> * (macOS 10.13.6)
> Reporter: Gabe Joseph
> Assignee: Wes McKinney
> Priority: Minor
> Labels: pull-request-available
> Fix For: 0.14.0
>
> Time Spent: 20m
> Remaining Estimate: 0h
>
> {{np.bool}} is the only dtype I've found that causes this issue. Both empty and non-empty arrays cause it.
> The issue only manifests from py2 to py3; staying within the same version succeeds, as does serializing from py3 and deserializing in py2.
> This appears to just be due to Python 2 {{str}} being deserialized in Python 3 as {{bytes}}; it should be {{unicode}} on the py2 end to come back as {{str}} in py3. I suppose something in the serialization implementation is writing the dtype (just for bool arrays?) using a {{str}}, but haven't dug into it yet.
> {code:bash}
> (two)bash-3.2$ python cereal.py
> (two)bash-3.2$ cat cereal.py
> # Python 2
> import numpy as np
> import pyarrow as pa
> data = np.array([], dtype=np.dtype('bool'))
> buf = pa.serialize(data).to_buffer()
> outstream = pa.output_stream("buffer")
> outstream.write(buf)
> outstream.close()
> # ...switch to python 3 venv...
> (three)bash-3.2$ cat decereal.py
> # Python 3
> import numpy as np
> import pyarrow as pa
> instream = pa.input_stream("buffer")
> buf = instream.read()
> data = pa.deserialize(buf)
> print(data)
> (three)bash-3.2$ python3 decereal.py
> Traceback (most recent call last):
> File "decereal.py", line 10, in <module>
> data = pa.deserialize(buf)
> File "pyarrow/serialization.pxi", line 448, in pyarrow.lib.deserialize
> File "pyarrow/serialization.pxi", line 411, in pyarrow.lib.deserialize_from
> File "pyarrow/serialization.pxi", line 262, in pyarrow.lib.SerializedPyObject.deserialize
> File "pyarrow/serialization.pxi", line 175, in pyarrow.lib.SerializationContext._deserialize_callback
> TypeError: can only concatenate str (not "bytes") to str
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)