You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Joris Van den Bossche (Jira)" <ji...@apache.org> on 2020/09/24 08:45:00 UTC

[jira] [Updated] (ARROW-4677) [Python] serialization does not consider ndarray endianness

     [ https://issues.apache.org/jira/browse/ARROW-4677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Joris Van den Bossche updated ARROW-4677:
-----------------------------------------
    Labels: pyarrow-serialization  (was: )

> [Python] serialization does not consider ndarray endianness
> -----------------------------------------------------------
>
>                 Key: ARROW-4677
>                 URL: https://issues.apache.org/jira/browse/ARROW-4677
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 0.12.1
>         Environment: * pyarrow 0.12.1
> * numpy 1.16.1
> * Python 3.7.0
> * Intel Core i7-7820HQ
> * (macOS 10.13.6)
>            Reporter: Gabe Joseph
>            Priority: Minor
>              Labels: pyarrow-serialization
>
> {{pa.serialize}} does not appear to properly encode the endianness of multi-byte data:
> {code}
> # roundtrip.py 
> import numpy as np
> import pyarrow as pa
> arr = np.array([1], dtype=np.dtype('>i2'))
> buf = pa.serialize(arr).to_buffer()
> result = pa.deserialize(buf)
> print(f"Original: {arr.dtype.str}, deserialized: {result.dtype.str}")
> np.testing.assert_array_equal(arr, result)
> {code}
> {code}
> $ pipenv run python roundtrip.py
> Original: >i2, deserialized: <i2
> Traceback (most recent call last):
>   File "roundtrip.py", line 10, in <module>
>     np.testing.assert_array_equal(arr, result)
>   File "/Users/gabejoseph/.local/share/virtualenvs/arrow-roundtrip-1xVSuBtp/lib/python3.7/site-packages/numpy/testing/_private/utils.py", line 896, in assert_array_equal
>     verbose=verbose, header='Arrays are not equal')
>   File "/Users/gabejoseph/.local/share/virtualenvs/arrow-roundtrip-1xVSuBtp/lib/python3.7/site-packages/numpy/testing/_private/utils.py", line 819, in assert_array_compare
>     raise AssertionError(msg)
> AssertionError: 
> Arrays are not equal
> Mismatch: 100%
> Max absolute difference: 255
> Max relative difference: 0.99609375
>  x: array([1], dtype=int16)
>  y: array([256], dtype=int16)
> {code}
> The data of the deserialized array is identical (big-endian), but the dtype Arrow assigns to it doesn't reflect its endianness (presumably uses the system endianness, which is little).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)