You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Joris Van den Bossche (Jira)" <ji...@apache.org> on 2020/09/24 08:45:00 UTC
[jira] [Updated] (ARROW-4677) [Python] serialization does not
consider ndarray endianness
[ https://issues.apache.org/jira/browse/ARROW-4677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Joris Van den Bossche updated ARROW-4677:
-----------------------------------------
Labels: pyarrow-serialization (was: )
> [Python] serialization does not consider ndarray endianness
> -----------------------------------------------------------
>
> Key: ARROW-4677
> URL: https://issues.apache.org/jira/browse/ARROW-4677
> Project: Apache Arrow
> Issue Type: Bug
> Components: Python
> Affects Versions: 0.12.1
> Environment: * pyarrow 0.12.1
> * numpy 1.16.1
> * Python 3.7.0
> * Intel Core i7-7820HQ
> * (macOS 10.13.6)
> Reporter: Gabe Joseph
> Priority: Minor
> Labels: pyarrow-serialization
>
> {{pa.serialize}} does not appear to properly encode the endianness of multi-byte data:
> {code}
> # roundtrip.py
> import numpy as np
> import pyarrow as pa
> arr = np.array([1], dtype=np.dtype('>i2'))
> buf = pa.serialize(arr).to_buffer()
> result = pa.deserialize(buf)
> print(f"Original: {arr.dtype.str}, deserialized: {result.dtype.str}")
> np.testing.assert_array_equal(arr, result)
> {code}
> {code}
> $ pipenv run python roundtrip.py
> Original: >i2, deserialized: <i2
> Traceback (most recent call last):
> File "roundtrip.py", line 10, in <module>
> np.testing.assert_array_equal(arr, result)
> File "/Users/gabejoseph/.local/share/virtualenvs/arrow-roundtrip-1xVSuBtp/lib/python3.7/site-packages/numpy/testing/_private/utils.py", line 896, in assert_array_equal
> verbose=verbose, header='Arrays are not equal')
> File "/Users/gabejoseph/.local/share/virtualenvs/arrow-roundtrip-1xVSuBtp/lib/python3.7/site-packages/numpy/testing/_private/utils.py", line 819, in assert_array_compare
> raise AssertionError(msg)
> AssertionError:
> Arrays are not equal
> Mismatch: 100%
> Max absolute difference: 255
> Max relative difference: 0.99609375
> x: array([1], dtype=int16)
> y: array([256], dtype=int16)
> {code}
> The data of the deserialized array is identical (big-endian), but the dtype Arrow assigns to it doesn't reflect its endianness (presumably uses the system endianness, which is little).
--
This message was sent by Atlassian Jira
(v8.3.4#803005)