You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by "Thomas Buhrmann (JIRA)" <ji...@apache.org> on 2018/05/11 13:12:00 UTC
[jira] [Created] (ARROW-2573) Field metadata is lost on
serialization round-trip
Thomas Buhrmann created ARROW-2573:
--------------------------------------
Summary: Field metadata is lost on serialization round-trip
Key: ARROW-2573
URL: https://issues.apache.org/jira/browse/ARROW-2573
Project: Apache Arrow
Issue Type: Bug
Reporter: Thomas Buhrmann
It seems only schema metadata roundtrips, while field metadata is lost:
{code:java}
import pandas as pd
import pyarrow as pa
fnm = "/path/to/file.arr"
df = pd.DataFrame({"x": [0,1,2,3]})
tbl = pa.Table.from_pandas(df)
metadata = {"custom": "test"}
# Update field metadata, and schema metadata
fields = [col.field.add_metadata(metadata) for col in tbl.itercolumns()]
schema_metadata = {**tbl.schema.metadata, **metadata}
schema = pa.schema(fields, metadata=schema_metadata)
tbl = pa.Table.from_batches(tbl.to_batches(), schema=schema)
print(tbl.column(0).field.metadata) # correct :)
print(tbl.schema.field_by_name("x").metadata) # correct :)
print(tbl.schema) # correct :)
# Roundtrip
writer = pa.RecordBatchStreamWriter(fnm, tbl.schema)
writer.write_table(tbl)
writer.close()
reader = pa.RecordBatchStreamReader(fnm)
tbl = reader.read_all()
# Check
print(tbl.column(0).field.metadata) # None :(
print(tbl.schema.field_by_name("x").metadata) # None :(
print(tbl.schema) # Metadata good :)
{code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)