You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by "Florian Jetter (Jira)" <ji...@apache.org> on 2020/03/10 10:36:00 UTC

[jira] [Created] (ARROW-8057) Schema equality not roundtrip safe

Florian Jetter created ARROW-8057:
-------------------------------------

             Summary: Schema equality not roundtrip safe
                 Key: ARROW-8057
                 URL: https://issues.apache.org/jira/browse/ARROW-8057
             Project: Apache Arrow
          Issue Type: Bug
          Components: Python
            Reporter: Florian Jetter


When performing schema roundtrips, the equality check for fields break. This is a regression from PyArrow 0.16.0

The equality check for entire schemas has never worked (but should from my POV)
{code:python}
import pyarrow.parquet as pq
import pyarrow as pa
print(pa.__version__)
fields = [
    pa.field("bool", pa.bool_()),
    pa.field("byte", pa.binary()),
    pa.field("date", pa.date32()),
    pa.field("datetime64", pa.timestamp("us")),
    pa.field("float32", pa.float64()),
    pa.field("float64", pa.float64()),
    pa.field("int16", pa.int64()),
    pa.field("int32", pa.int64()),
    pa.field("int64", pa.int64()),
    pa.field("int8", pa.int64()),
    pa.field("null", pa.null()),
    pa.field("uint16", pa.uint64()),
    pa.field("uint32", pa.uint64()),
    pa.field("uint64", pa.uint64()),
    pa.field("uint8", pa.uint64()),
    pa.field("unicode", pa.string()),
    pa.field("array_float32", pa.list_(pa.float64())),
    pa.field("array_float64", pa.list_(pa.float64())),
    pa.field("array_int16", pa.list_(pa.int64())),
    pa.field("array_int32", pa.list_(pa.int64())),
    pa.field("array_int64", pa.list_(pa.int64())),
    pa.field("array_int8", pa.list_(pa.int64())),
    pa.field("array_uint16", pa.list_(pa.uint64())),
    pa.field("array_uint32", pa.list_(pa.uint64())),
    pa.field("array_uint64", pa.list_(pa.uint64())),
    pa.field("array_uint8", pa.list_(pa.uint64())),
    pa.field("array_unicode", pa.list_(pa.string())),
]

schema = pa.schema(fields)

buf = pa.BufferOutputStream()
pq.write_metadata(schema, buf)
reader = pa.BufferReader(buf.getvalue().to_pybytes())
reconstructed_schema = pq.read_schema(reader)

assert reconstructed_schema == reconstructed_schema
assert reconstructed_schema[0] == reconstructed_schema[0]
# This breaks on master / regression from 0.16.0 
assert schema[0] == reconstructed_schema[0]

# This never worked but should
assert reconstructed_schema == schema
assert schema == reconstructed_schema
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)