You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Yue Ni (Jira)" <ji...@apache.org> on 2022/04/06 05:22:00 UTC
[jira] [Created] (ARROW-16131) record batch specific metadata is not saved in IPC file
Yue Ni created ARROW-16131:
------------------------------
Summary: record batch specific metadata is not saved in IPC file
Key: ARROW-16131
URL: https://issues.apache.org/jira/browse/ARROW-16131
Project: Apache Arrow
Issue Type: Bug
Components: C++
Affects Versions: 7.0.0
Reporter: Yue Ni
When writing an IPC file having multiple record batches, the schema provided to `IpcFormatWriter` is correctly written to IPC file's footer, however, if the record batch written has its batch specific metadata associated with it, this metadata is not written.
This can be reproduced with the following test case (using pyarrow):
```python
def test_chunked_record_batch_meta():
num_batches = 2
ipc_file = "/tmp/batches_with_metadata.arrow"
int_array = pa.array([i for i in range(chunk_size)])
schema = pa.schema(
[
("values", pa.int64()),
],
metadata=\{"foo": "bar" },
)
writer = pa.RecordBatchFileWriter(
ipc_file, schema
)
for i in range(num_batches):
# follow examples here:
# https://github.com/apache/arrow/blob/master/python/pyarrow/tests/test_table.py
batch = pa.record_batch(
[int_array],
names=["values"],
metadata=\{"batch_id": str(i)},
)
writer.write_batch(batch)
writer.close()
mmapped_file = pa.memory_map(ipc_file)
reader = pa.ipc.open_file(mmapped_file)
batch_0 = reader.get_record_batch(0)
assert batch_0.schema.metadata
```
--
This message was sent by Atlassian Jira
(v8.20.1#820001)