You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Gabe Joseph (JIRA)" <ji...@apache.org> on 2019/02/25 23:48:00 UTC

[jira] [Created] (ARROW-4677) [Python] serialization does not consider ndarray endianness

Gabe Joseph created ARROW-4677:
----------------------------------

             Summary: [Python] serialization does not consider ndarray endianness
                 Key: ARROW-4677
                 URL: https://issues.apache.org/jira/browse/ARROW-4677
             Project: Apache Arrow
          Issue Type: Bug
          Components: Python
    Affects Versions: 0.12.1
         Environment: * pyarrow 0.12.1
* numpy 1.16.1
* Python 3.7.0
* Intel Core i7-7820HQ
* (macOS 10.13.6)
            Reporter: Gabe Joseph


{{pa.serialize}} does not appear to properly encode the endianness of multi-byte data:
{code}
# roundtrip.py 
import numpy as np
import pyarrow as pa

arr = np.array([1], dtype=np.dtype('>i2'))

buf = pa.serialize(arr).to_buffer()
result = pa.deserialize(buf)

print(f"Original: {arr.dtype.str}, deserialized: {result.dtype.str}")
np.testing.assert_array_equal(arr, result)
{code}
{code}
$ pipenv run python roundtrip.py
Original: >i2, deserialized: <i2
Traceback (most recent call last):
  File "roundtrip.py", line 10, in <module>
    np.testing.assert_array_equal(arr, result)
  File "/Users/gabejoseph/.local/share/virtualenvs/arrow-roundtrip-1xVSuBtp/lib/python3.7/site-packages/numpy/testing/_private/utils.py", line 896, in assert_array_equal
    verbose=verbose, header='Arrays are not equal')
  File "/Users/gabejoseph/.local/share/virtualenvs/arrow-roundtrip-1xVSuBtp/lib/python3.7/site-packages/numpy/testing/_private/utils.py", line 819, in assert_array_compare
    raise AssertionError(msg)
AssertionError: 
Arrays are not equal

Mismatch: 100%
Max absolute difference: 255
Max relative difference: 0.99609375
 x: array([1], dtype=int16)
 y: array([256], dtype=int16)
{code}

The data of the deserialized array is identical (big-endian), but the dtype Arrow assigns to it doesn't reflect its endianness (presumably uses the system endianness, which is little).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)